The University of Texas at Austin

Honors Theses


Houston Putman(1). "Exploiting Historical Context in Irregular Data Prefetching". The University of Texas at Austin, Department of Computer Science(1). Report# HR-16-10 (honors theses). July 12th, 2016. 17 pages.

In this paper we propose a new irregular data prefetcher, Temporal Sparse Distributed Memory, which builds on the address correlation and PC localization. Address correlation, learning correlation between pairs of addresses, is the inspiration behind the current leader in irregular data prefetching. TSDM distinguishes itself from current address correlation techniques by including the correlation information from a stream of addresses in the prediction step. This combination of correlation information replicates the historical context of the stream. We show that the use of historical context gives TSDM a significant advantage over conventional address correlation on micro-benchmarks that exhibit high levels of aliasing. TSDM fares slightly better on irregular benchmarks than the current irregular prefetching leader, and it performs similarly when run on a restricted hardware budget.

DOWNLOAD TR-2235.pdf


Matthew Allen(1). "Parametric Polymorphism in the Go Programming Language". The University of Texas at Austin, Department of Computer Science(1). Report# HR-16-09 (honors theses). June 15th, 2016. 27 pages.

An extension to the Go language was developed that introduces parametric polymorphism in the form of generic functions. The changes to the language and the compiler needed to implement the type system extensions are discussed, and alternative implementation strategies are described. The resulting implementation of generic functions is backwards compatible with the existing Go standard and is consistent with the design goals of the language. The overhead of the current prototype implementation is assessed based on two commonly used programming patterns.

DOWNLOAD TR-2231.pdf


Chance O. Raine(1). "Branch Prediction with CPR". The University of Texas at Austin, Department of Computer Science(1). Report# HR-16-08 (honors theses). May 20th, 2016. 14 pages.

In this paper we introduce the Contents-Perceptron-Region (CPR) branch predictor, which is loosely based on Sparse Distributed Memory, an AI technique from the 1980s. SDM allows us to expand the design space of branch prediction by adding fuzzy history matching and context-based predictions. CPR makes predictions using a history of branch addresses, rather than using branch outcomes as most predictors do, which allows it to potentially reduce aliasing. Perceptrons allow CPR to utilize long histories effectively, and regional history weights help reduce the amount of noise in the output. We describe the operation of CPR and evaluate it against well known predictors. Although the CPR predictor's accuracy is far behind that of the current state-of-the-art predictors, it beats the original perceptron predictor by 0.65 mispredictions per thousand instructions. We describe possible explanations for these results, and future additions to the predictor that may increase performance.

DOWNLOAD TR-2229.pdf


Josh Eversmann(1). "Improving Digital Side Channel Protection with Static Analysis Techniques". The University of Texas at Austin, Department of Computer Science(1). Report# HR-16-07 (honors theses). May 17th, 2016. 11 pages.

Side-channel attacks monitor running programs to extract sensitive information from them. Escort combats a broad class of such attacks by performing compiletime transformations which hide the data processed by the resulting binary. Escort focuses on protecting floating point operations. We extend Escort to be able to transform C programs with pointers or complicated control flow that were formally unusable.

DOWNLOAD TR-2228.pdf


Patrick Haley(1). "The Evolution of Language Groups among Cooperating Digital Predators". The University of Texas at Austin, Department of Computer Science(1). Report# HR-16-06 (honors theses). May 12th, 2016. 28 pages.

Many species of animals have evolved complex means for communicating with one another. Oftentimes, communication is essential for the execution of tasks that require cooperation between individuals, such as group hunting and mate selection. As a result, communication itself becomes essential for survival. While these facts are readily observed, the evolutionary processes underlying them are less understood, in large part because observational - much less controlled - studies of these processes are impossible. Both the timescales and population sizes required for such studies are simply too great. To address these problems, this thesis uses simulated predators to study the evolution of language in animals. These digital predators evolve to perform two cooperative tasks: hunting and mate selection. After the populations of predators have evolved to perform both tasks successfully, the population is decomposed into both language groups and cooperative groups. Spectral clustering identifies predators that speak similar languages, while merge clustering is used to find those groups of predators that are the most successful when working together. Analysis of the groups generated by these two different methods shows that the most successful pairings are not necessarily those in which the two individuals are speaking the same language. Rather, organisms can evolve to speak a different language than the one to which they respond. Moreover, even though one task -- mate selection -- evolves earlier in evolutionary history, the language diversity it produces counteracts any head-start provided for the evolution of the second task. Thus, not only is language important for the evolution of cooperative task success, but the appearance of language groups can also play a determinant role in the evolution of cooperation.

DOWNLOAD TR-2225.pdf


Paul Khermouch(1). "Heterogeneous Animated Path-Tracing Using Automatic Granularity Control". The University of Texas at Austin, Department of Computer Science(1). Report# HR-16-05 (honors theses). May 12th, 2016. 18 pages.

Abstract: Heterogeneous computing is an exciting new field that aims to make use of all processors on a given architecture in order to perform general-purpose computing as efficiently as possible. However, designing a programming framework to do this that is fast, portable, and responsive to problem behavior, while being easy to program and introducing minimal overhead, has proven challenging for many reasons. DICE is one such system that has shown promise; here, I show that DICE can effectively load-balance multiple different types of workloads among different device types, and adapt to irregularity in work runtimes.

DOWNLOAD TR-2224.pdf


I-Hsun Liao(1). "Linked-data cache with provenance and optimistic execution of linked-data queries". The University of Texas at Austin, Department of Computer Science(1). Report# HR-16-04 (honors theses). May 11th, 2016. 40 pages.

Diamond is a linked-data query system based on the Rete-match algorithm. In this paper we introduce the architecture of the Diamond cache. The Rete-match, originally intended for the implementation of forward-chaining rule systems, is able to incrementally and non-monotonically evaluate queries. Thus, we introduce the optimistic execution of linked-data queries. That is, when possible, dereferenced URI's are concurrently fetched from cache and dereferenced. The query system quickly computes results from cached contents. Subsequently, triples returned by dereferenced URI's are compared with cached copies. If the cached copies are current no action need be taken. Then, we introduce eventual execution, which is an extension of optimistic execution. In eventual execution, if the aforementioned cached triples have become stale, the current version of the triples are forwarded to the Rete network and the query result updated accordingly. The cache is built using a conventional triple-store. The cache contents are modeled using the standard open provenance model. The expectation is that in future versions of the system the provenance of individual triples may be integrated into the behavior of the cache replacement algorithm.

DOWNLOAD TR-2223.pdf


Tyler Yates(1). "Automated Generation of Questions from Factual, Natural Language Sentences". The University of Texas at Austin, Department of Computer Science(1). Report# HR-16-03 (honors theses). May 11th, 2016. 49 pages.

This paper describes methods to examine factual, natural language sentences in order to generate factual, natural language questions whose answers are found in the sentences. These methods rely on rules that look for patterns in the syntactic structure, named entities, and grammatical relations of a sentence. The extraction of these pieces of information from a sentence is not the focus of this paper. Rather, this paper examines how these pieces of information can be used for the purpose of factual, natural language question generation. This paper also presents an implementation of these methods in the form of a computer program called Specimus. This paper describes the implementation of the system and presents an evaluation of the output of the system.

DOWNLOAD TR-2222.pdf


Tyler O'Meara(1). "Securing DNSSEC Against State Level Adversaries". The University of Texas at Austin, Department of Computer Science(1). Report# HR-16-02 (honors theses). May 6th, 2016. 33 pages.

The security of data transmitted on the Internet is dependent on the integrity of the Domain Name System (DNS). A large amount of work has gone into securing DNS, particularly the Domain Name System Security Extensions (DNSSEC), which protects against traditional off-path and man in the middle attacks. However, DNSSEC relies on the assumption that all of the name servers used for resolution are honest. We extend the threat model of DNSSEC to include state level adversaries which undermine the security of some of these name servers. As a result, we remove the assumption that the name servers used for resolution are trustworthy. We discuss the infeasibility of a secure protocol which does not require trusting a third party and instead introduce a mechanism for obtaining distributed trust from a set of mutually untrustworthy agents. We then integrate this mechanism into DNSSEC in order to use distributed trust to ensure validity of DNS responses, even while being attacked by state level adversaries.

DOWNLOAD TR-2221.pdf


Mark Mansi(1), Calvin Lin(1). "An Improved Hybrid AMPM and ISB prefetcher". The University of Texas at Austin, Department of Computer Science(1). Report# HR-16-01 (honors theses). May 9th, 2016. 8 pages.

In their recent work, Jain and Lin propose a hybrid prefetcher composed of the Access Map Pattern Matching Prefetcher (AMPM) and the Irregular Stream Buffer (ISB), two state-of-the-art prefetchers. While their hybrid achieves high performance, it can also waste memory bandwidth by making many inaccurate prefetches, which can hurt system performance in bandwidth-constrained environments. We aim to improve their work along two fronts. First, our design improves the accuracy of AMPM. Second, our design identifies whether the current workload is regular or irregular and uses this information to toggle between AMPM and ISB in the hybrid. While our design does not fully meet these goals, we do show that it improves AMPM accuracy by 7% while decreasing coverage by only 3%. Our solution improves Hybrid accuracy by 4% and bandwidth consumption by 8% on average, though it decreases speedup by 17% compared with the original hybrid.

DOWNLOAD TR-2220.pdf


Karen C. Tsai(1). "On Configuring Distributed Memory Process Grids for Tensor Contraction Applications". The University of Texas at Austin, Department of Computer Science(1). Report# HR-15-09 (honors theses). July 8th, 2015. 21 pages.

Tensor contractions are important computations for many domains including quantum chemistry and big-data applications. Distributed-memory architectures, architectures where numerous processing elements have private memory, are used in order to solve large problems efficiently. On such systems, achieving high performance computation is dependent on the arrangement of the processing elements. The optimal configuration is typically a nontrivial function of the network topology and application implementation. In this paper, we investigate this relationship between network topology and the communication pattern of the application and develop a framework towards automatically determining the near-optimal arrangement of processing elements for a given application on a distributed-memory architecture.

DOWNLOAD TR-2202.pdf


Jim Given(1). "Building Replicated Systems for Performance". The University of Texas at Austin, Department of Computer Science(1). Report# HR-15-08 (honors theses). May 26th, 2015. 38 pages.

Building large replicated systems is a complicated task in which suboptimal implementation and configuration choices can potentially create performance bottlenecks. This thesis demonstrates an approach to making optimal choices by implementing and configuring a multithreaded pipeline in a novel state machine replication framework, Adam. Using this approach, the multithreaded pipeline is able to achieve near-optimal performance, significantly increasing resource utilization.

DOWNLOAD TR-2200.pdf


Mark Fulbright(1). "An Evaluation of Two Approaches to Parsing". The University of Texas at Austin, Department of Computer Science(1). Report# HR-15-07 (honors theses). May 26th, 2015. 21 pages.

The problem of taking a string and parsing it to determine the underlying structure is one that is very important in computer science. Earley's algorithm is a well-known algorithm that solves this problem, but it is widely accepted that this algorithm is difficult to understand and tricky to implement correctly. Pingali proposed a new approach to parsing that is equivalent to Earley's algorithm, but much easier to reason about. In this thesis, we present a typical implementation of Earley's algorithm and an implementation of Pingali's approach side by side, and discuss the differences in implementing each. We also test each implementation as well as an existing implementation of Earley's algorithm, and compare their performance.

DOWNLOAD TR-2199.pdf


Crystal Riley(1). "Centroidal Voronoi Diagrams and Euler-Lagrangian Methods for Astrophysical Simulations". The University of Texas at Austin, Department of Computer Science(1). Report# HR-15-06 (honors theses). May 22nd, 2015. 44 pages.

Plain text abstract: Large scale cosmological simulations such as galaxy and star formation are of great interest to cosmologists. These simulations are done by consideration of relativistic, compressible, discontinuous fluids. Many techniques from particle based to adaptive meshing have been explored, each with advantages and disadvantages. Here, the limitations of centroidal Voronoi tessellations with these astrophysical fluids are investigated in great detail, showing that centroidal Voronoi tessellations are suitable for use with these fluids. Furthermore, the introduction of an ad hoc time dependence technique for these centroidal Voronoi tessellations is analyzed and shown to be not useful for large scale simulation. Lastly, a new meshing technique is introduced called the Euler-Lagrange centroidal Voronoi tessellations, resembling standard centroidal Voronoi tessellations but enforcing a centroidal Voronoi tessellations that advects with the fluid.

DOWNLOAD TR-2198.pdf


Ellis Michael(1). "Scaling Leader-Based Agreement Protocols for State Machine Replication". The University of Texas at Austin, Department of Computer Science(1). Report# HR-15-05 (honors theses). May 18th, 2015. 42 pages.

State machine replication is a technique used to guarantee the availability of a system even in the presence of faults. Agreement protocols are often used to implement state machine replication. However, the throughput of many agreement protocols, such as Paxos, does not scale with the number of machines available to the system. Systems whose throughput does scale often provide weaker consistency guarantees than state machine replication's linearizability. These weaker guarantees, such as eventual consistency, make the job of the application programmer much more difficult. Kapritsos and Junqueira previously proposed a protocol which uses an agreement protocol as a primitive and does scale. In this thesis, I describe my novel implementation of this protocol as well as the optimizations that were necessary to achieve scalability. The results of my experiments with this system show modest but promising evidence that the throughput of this system can, indeed, scale linearly with the number of machines.

DOWNLOAD TR-2197.pdf


William Morriss(1). "A Concurrent Dictionary and Wait-Free Queue from Compare and Swap". The University of Texas at Austin, Department of Computer Science(1). Report# HR-15-04 (honors theses). May 18th, 2015. 15 pages.

This paper describes algorithms for a lock-free dictionary and a wait- free queue. The queue improves on a previous result by Kogan and Pe- trank by not requiring Stop-The-World Garbage Collection. The lock-free dictionary is structured as a multi-level hash table to avoid global resize. It can be made wait-free for most practical key types with trivial modifications. The dictionary supports sequentially consistent compare and swap operations in addition to exchange and put, but does not support removal. It was over 3 times faster than a previous concurrent multi-level hash table result. The data structures in this paper expect a compare and swap primitive, found in many commodity architectures including x86, x64, and Itanium.

DOWNLOAD TR-2196.pdf


Evan Ott(1). "Estimating Rainfall with Neural Networks and Conditional Random Fields". The University of Texas at Austin, Department of Computer Science(1). Report# HR-15-03 (honors theses). May 17th, 2015. 51 pages.


DOWNLOAD TR-2195.pdf


David Wetterau(1). "Communicating Replicated State Machines". The University of Texas at Austin, Department of Computer Science(1). Report# HR-15-02 (honors theses). May 15th, 2015. 42 pages.

State Machine Replication (SMR) is a method to provide a highly available distributed service that traditionally relies on deterministic sequential execution to guarantee that replicas all agree on the same final state. In addition to relying on sequential execution, many SMR protocols adhere to the client-server model that no longer fits many applications today. This thesis presents a novel approach to provide the replicated state machine abstraction in an environment with communicating services that leverages both the multicore nature of today's machines and pipelining techniques to improve performance and resource utilization.

DOWNLOAD TR-2194.pdf


Josh Slocum(1). "Efficient Fluid Simulation Algorithms". The University of Texas at Austin, Department of Computer Science(1). Report# HR-15-01 (honors theses). May 14th, 2015. 28 pages.

The special effects industry requires photo-realistic fluid simulation techniques to produce visually pleasing images. This involves capturing small scale visual details such as thin films and rolling motions, requiring high levels of computational discretization. Adaptive data structures are optimal for this type of calculation because only a small subset of the fluid requires the full computational discretization. This research presents an adaptive multigrid solver for the Navier-Stokes equations on a k-d tree, compared against the previous standard for an adaptive multigrid solver, an octree.

DOWNLOAD TR-2193.pdf


Mike Depinet(1), Patrick MacAlpine(1), Peter Stone(1). "Keyframe Sampling, Optimization, and Behavior Integration: A New Longest Kick in the RoboCup 3D Simulation League". The University of Texas at Austin, Department of Computer Science(1). Report# HR-14-05 (honors theses). July 25th, 2014. 22 pages.

Even with improvements in machine learning enabling robots to quickly optimize and perfect their skills, developing a seed skill from which to begin an optimization remains a necessary challenge for large action spaces. This thesis proposes a method for creating and using such a seed by i) observing the effects of the actions of another robot, ii) further optimizing the skill starting from this seed, and iii) embedding the optimized skill in a full behavior. Called KSOBI, this method is fully implemented and tested in the complex RoboCup 3D simulation domain. The main result is a kick that, to the best of our knowledge, kicks the ball farther in this simulator than has been previously documented.

DOWNLOAD TR-2177.pdf


Zal Bhathena(1). "Near Optimal Hierarchical Pathfinding using Triangulations". The University of Texas at Austin, Department of Computer Science(1). Report# HR-14-04 (honors theses). May 8th, 2014. 46 pages.

In game programming, A* is the standard algorithm used for pathfinding. For large search spaces, A* can consume large amounts of CPU and memory resources. HPA*, an algorithm that alleviates these problems, represents the search space as a grid and divides the grid into clusters, then does an "on-line search" to find the clusters to travel through. This allows HPA* to calculate the majority of the absolute path asynchronously while the agent is in transit, reducing the start up delay that would occur from standard A*. We present THPA*, an extended HPA* which provides many features absent from HPA*. THPA* has support for search space modifications at run time, finding paths for non-zero radius agents, and handling polygonal obstacles.

DOWNLOAD TR-2174.pdf


Joshua Harry Berlin(1). "Secure User I/O for Applications on an Untrusted Operating System". The University of Texas at Austin, Department of Computer Science(1). Report# HR-14-03 (honors theses). May 8th, 2014. 30 pages.

In this paper I discuss the topic of operating system trust, and specifically why it is important to remove the OS from the trusted computing base. I explore past work on running trusted applications on top of an untrusted OS including InkTag, and analyze why such systems need a mechanism for Secure User I/O. I introduce a mechanism for Secure User I/O that builds on a newer version of InkTag, taking advantage of the unique features of our system to provide an efficient and lightweight path for user input and output. This improved path for user I/O supports both graphical (X-windows) and terminal based applications with no modifications to application code. To our knowledge, this system is the first to provide a comprehensive solution for secure user I/O on an untrusted operating system without modification by the application developer.

DOWNLOAD TR-2173.pdf


Calvin MacKenzie(1). "Integrating Visual and Linguistic Information to Describe Properties of Objects". The University of Texas at Austin, Department of Computer Science(1). Report# HR-14-02 (honors theses). May 7th, 2014. 18 pages.

Generating sentences from images has historically been performed with standalone Computer Vision systems. The idea of combining visual and linguistic information has been gaining traction in the Computer Vision and Natural Language Processing communities over the past several years. The motivation for a combined system is to generate richer linguistic descriptions of images. Standalone vision systems are typically unable to generate linguistically rich descriptions. This approach combines abundant available language data to clean up noisy results from standalone vision systems. This thesis investigates the performance of several models which integrate information from language and vision systems in order to describe certain attributes of objects. The attributes used were split into two categories: color attributes and other attributes. Our proposed model was found to be statistically significantly more accurate than the vision system alone for both sets of attributes.

DOWNLOAD TR-2172.pdf


Matthew Hicks(1). "Exploring the Design Space of DRAM Caches". The University of Texas at Austin, Department of Computer Science(1). Report# HR-14-01 (honors theses). May 7th, 2014. 10 pages.

Die-stacked DRAM caches represent an emerging technology that offers a new level of cache between SRAM caches and main memory. As compared to SRAM, DRAM caches offer high capacity and bandwidth but incur high access latency costs. Therefore, DRAM caches face new design considerations that include the placement and granularity of tag storage in either DRAM or SRAM. The associativity of the cache and the inherent behavior and constraints of DRAM are also factors to consider in the design of DRAM caches. In this thesis, we explore and analyze the different factors of DRAM cache design and their impact upon performance; the goal is to identify promising areas of the design space that deserve further study.

DOWNLOAD TR-2171.pdf


Albert Haque(1). "A MapReduce Approach to NoSQL RDF Databases". The University of Texas at Austin, Department of Computer Science(1). Report# HR-13-13 (honors theses). December 13th, 2013. 81 pages.

In recent years, the increased need to house and process large volumes of data has prompted the need for distributed storage and querying systems. The growth of machine-readable RDF triples has prompted both industry and academia to develop new database systems, called "NoSQL," with characteristics that differ from classical databases. Many of these systems compromise ACID properties for increased horizontal scalability and data availability. This thesis concerns the development and evaluation of a NoSQL triplestore. Triplestores are database management systems central to emerging technologies such as the Semantic Web and linked data. A triplestore comprises data storage using the RDF (resource description framework) data model and the execution of queries written in SPARQL. The triplestore developed here exploits an open-source stack comprising, Hadoop, HBase, and Hive. The evaluation spans several benchmarks, including the two most commonly used in triplestore evaluation, the Berlin SPARQL Benchmark, and the DBpedia benchmark, a query workload that operates an RDF representation of Wikipedia. Results reveal that the join algorithm used by the system plays a critical role in dictating query runtimes. Distributed graph databases must carefully optimize queries before generating MapReduce query plans as network traffic for large datasets can become prohibitive if the query is executed naively.

DOWNLOAD TR-2160.pdf


Kev Kitchens(1). "Predicting and Prefetching TLB Entries from Irregular Access Streams". The University of Texas at Austin, Department of Computer Science(1). Report# HR-13-12 (honors theses). December 12th, 2013. 16 pages.

This thesis describes the design and evaluation the Irregular Translation Buffer (ITB),which is a prefetcher for translation lookaside buffer (TLB) entries that is based on the Irregular Stream Buffer (ISB) introduced by Jain and Lin [1]. The main idea to use the ISB's improvements in prefetching cache lines from irregular access streams and use them to prefetch entries from irregular access streams for the translation lookaside buffer (TLB). The ISB is particularly good at tracking irregular access streams, which are sequences of temporally correlated memory reads. The ITB attempts to translate the ISB's strengths from the realm of prefetching cache lines to that of prefetching TLB entries. I evaluate the ITB using the MARSS full x86 system simulator and several dual-core SPECint and SPECfp 2006 benchmarks to evaluate the ITB. I was unable to show that the ITB causes these workloads to exhibit a speedup due to a lack of prefetch candidates that are not already found in the TLB. Based in this result, I discuss possible workloads that could be used to better evaluate the ITB.

DOWNLOAD TR-2159.pdf

Wide character in print at /u/www/apps/tech_reports/ncstrl/ line 24, line 68133.

Kevin Jia(1). "Improving Data Locality of the Nonsymmetric QR Algorithm". The University of Texas at Austin, Department of Computer Science(1). Report# HR-13-11 (honors theses). December 11th, 2013. 32 pages.

The QR algorithm computes the Schur decomposition of a matrix and is the most popular algorithm for solving eigenvalue problems for dense nonsymmetric matrices. The algorithm suffers from a memory access bottleneck though. By restructuring the application of Householder reÔ¨āectors to the transformation matrix in the nonsymmetric QR algorithm, data locality can be improved, increasing performance. This improvement is demonstrated against the LAPACK implementation of the implicit QR algorithm for nonsymmetric matrices, DLAHQR.

DOWNLOAD TR-2158.pdf


Dustin Carlino(1). "Approximately Orchestrated Routing and Transportation Analyzer: City-scale traffic simulation and control schemes". The University of Texas at Austin, Department of Computer Science(1). Report# HR-13-10 (honors theses). December 11th, 2013. 93 pages.

Along with other intelligent traffic control schemes, autonomous vehicles present new opportunities for addressing traffic congestion. Traffic simulators enable researchers to explore these possibilities before such vehicles are widespread. This thesis describes a new open source, agent-based simulator: the Approximately Orchestrated Routing and Transportation Analyzer, or AORTA. AORTA is designed to provide reasonably realistic simulation of any city in the world with zero configuration, to run on cheap machines, and with an emphasis on easy use and simple code. Experiments described in this thesis can be set up on a new city in about five minutes. Two applications are built on AORTA by creating new intersection control policies and specifying new strategies for routing drivers. The first application, intersection auctions, allows humans to instruct their autonomous vehicle to bid on their behalf at intersections, granting them entry before other drivers who desire conflicting movements. The second, externality pricing, learns the travel time of a variety of different routes and defines a localized notion of cost imposed on other drivers by following the route. This information is used to tax drivers who choose to improve their own trip by inconveniencing others. These two systems demonstrate AORTA's utility for simulating control of the traffic of the near future.

DOWNLOAD TR-2157.pdf


Chris Donahue(1). "Applications of genetic programming to digital audio synthesis". The University of Texas at Austin, Department of Computer Science(1). Report# HR-13-09 (honors theses). December 10th, 2013. 42 pages.

The creation of digital audio synthesis algorithms has been a historically manual task. Tuning the numeric parameters of existing algorithms to produce a desired sound can take hours and still not produce the intended results. Electronic composers that choose to design their own digital instruments are typically trying to accomplish one of two tasks: mimicking an acoustic or electronic instrument that they have heard before or creating something entirely new. This thesis presents two applications of genetic programming to the domain of digital audio synthesis in an attempt to increase automation and accessibility of these processes. The first application is a command line tool which can automatically generate synthesis algorithms that mimic the timbre of an input sound file. The second application is a VST audio plugin that allows users to interact with genetic programming to search for pleasing or novel synthesis algorithms. The applications are designed to interface; synthesis algorithms discovered by the timbre mimicking application can be saved and loaded into the VST plugin to be played as MIDI instruments.

DOWNLOAD TR-2156.pdf


Kevin James Peddicord Luecke(1). "Light Curve Fitting on Heterogeneous Computers". The University of Texas at Austin, Department of Computer Science(1). Report# HR-13-08 (honors theses). August 19th, 2013. 16 pages.

The field of computer science is one of the most recently created and rapidly expanding sciences. In the years since its inception in the mid 1900s, it has revolutionized the other sciences, businesses, and our personal lives. With increasing capabilities, however, comes increasing complexity. In order to deal with this complexity, computer scientists are creating new unconventional hardware which is becoming more and more parallel. With different hardware on your computer it becomes harder to effectively use all this computational power. GAMA (GPU And Multi-core Aware) addresses this problem by providing a runtime system that determines the amount of work that should be sent to each available resource dynamically. In order to demonstrate the abilities of this approach, a sample program with real applications in astrophysics has been implemented. This program, Lcfit Theta, attempts to find the size of convection zones in white dwarf stars. Most white dwarfs pulsate or oscillates in temperature and density during their life cycles. The oscillations can be seen from earth telescopes which record a starís brightness as a function of time in a graph known as a ílight- curveí. These light-curves can be reproduced fairly well by a simple linear sum of sine functions corresponding to the frequencies and amplitudes at which it the star is oscillating. However, in stars where a convection zone is present at the surface, non-linear combinations of these sine functions appear in the light-curve. Lcfit Theta uses a model of these white dwarfs proposed by Montgomery which includes the effects of these convective regions, to generate synthetic light curves (9). By comparing these synthetic light curves to the measured stellar data, we can find the input parameters to Mongomeryís model which create a synthetic light curve that matches the data. One of these ífittedí input parameters is the size of the convection zone. This application is the focus of this thesis.

DOWNLOAD TR-2149.pdf


James Levitt(1). "Adding Aggressive Early Deflation to the Restructured Symmetric QR Algorithm". The University of Texas at Austin, Department of Computer Science(1). Report# HR-13-07 (honors theses). June 14th, 2013. 22 pages.

The QR algorithm is an algorithm for computing the spectral decomposition of a symmetric matrix. Despite it's high accuracy, other methods are often preferred for the symmetric eigenvalue problem due to the QR algorithm's relatively poor performance. In recent years, new techniques have arisen that dramatically improve its performance. The restructured symmetric QR algorithm, introduced in, seeks to increase the rate at which operations are performed, thus reducing execution time. Aggressive early deflation, introduced in, seeks to reduce the total number of operations performed. This paper investigates these two techniques and explores the addition of aggressive early deflation to the restructured symmetric QR algorithm.

DOWNLOAD TR-2146.pdf


Joseph Barratt(1). "Fast Houseswapping Segmentation". The University of Texas at Austin, Department of Computer Science(1). Report# HR-13-06 (honors theses). May 15th, 2013. 10 pages.

Housing swapping problems involve a set of homeowners with preferences over the houses in the market. An allocation is a matching from agents to houses. A great deal of study has gone into the efficient computation of such allocations, and the notions of their optimality. In this paper we review several of these notions of optimality, and offer an improvement to the computational efficiency of computing a strict core allocation, improving the previous $O(n^3)$ bound to $O(mn^{1/2})$ plus the lesser of $O(n^2\log n)$ and $O(\min(m^{1/2},n^{2/3})m)$.

DOWNLOAD TR-2131.pdf


Patrick Day(1). "Scoring Docked Protein Complexes with Hydrogen Bonds". The University of Texas at Austin, Department of Computer Science(1). Report# HR-13-05 (honors theses). May 17th, 2013. 13 pages.

Protein-protein complexes can be scored for correctness by many different parameters. Hydrogen bonding can provide specificity to protein-protein complexes, so could potentially be used as a parameter for scoring. This thesis shows the viability of using hydrogen bonds as a scoring function for protein-protein complexes.

DOWNLOAD TR-2130.pdf


Andrew Porter(1). "Computer Science Honors Thesis". The University of Texas at Austin, Department of Computer Science(1). Report# HR-13-04 (honors theses). May 14th, 2013. 13 pages.

In vitro selection has proven itself as a great tool for isolating func- tional single stranded RNA sequences from large random pools. However, the multitude of available folding pathways leave many potentially func- tional sequences in misfolded states. This research aimed to use kinetic folding simulations of single-stranded RNA to test thermocycling as a means of overcoming energy barriers that prevent sequences from reach- ing a functional secondary structure. In this pursuit, I tested the negative effects of adding a number of random bases to each end of a known se- quence, and showed that additional bases can drastically diminish the folding success of sequences that otherwise fold correctly at a very high rate. I then simulated thermocycling of a typically unsuccessful sequence and demonstrated that high temperatures break most or all base-pairs within a sequence, allowing refolding to occur.

DOWNLOAD TR-2128.pdf

Wide character in print at /u/www/apps/tech_reports/ncstrl/ line 24, line 68133.

Joseph Tessler(1). "Using Cargo-Bot to Provide Contextualized Learning of Recursion". The University of Texas at Austin, Department of Computer Science(1). Report# HR-13-03 (honors theses). May 11th, 2013. 39 pages.

This thesis presents a new method of teaching recursion in whichrnstudents Ô¨Ārst play a video game that encourages recursive thinking.rnResults from a controlled experiment with 47 high school studentsrntaking AP Computer Science A indicate that this instructional strategyrnproduces signiÔ¨Ācant improvements in students¬í understanding ofrnrecursion. Additionally, survey results show that nearly every studentrnenjoys the learning activity and is conÔ¨Ādent in his or her ability tornaccomplish the recursive exercises.

DOWNLOAD TR-2127.pdf


Prad Nelluru(1). "Fast Approximate k-Nearest Neighbors in High Dimensions". The University of Texas at Austin, Department of Computer Science(1). Report# HR-13-02 (honors theses). May 8th, 2013. 35 pages.

k-nearest neighbors search is widely-used in many applications, including computer vision, information retrieval, and machine learning. k-means locality sensitive hashing is a data-adaptive method for solving this problem approximately. Previous studies of the algorithm focus only on its accuracy with respect to the number of centroids in the k-means phase. In this work, we show the impact of all of the parameters on both accuracy and speed. The parameters are evaluated on a real-world setup, namely search on 128-dimensional SIFT image descriptors. We reveal that surprisingly little work is required in the k-means phase to build indexes that enable fast and accurate searching. Further, we show that accuracy can be increased dramatically by using both multiple indexes and probes. We also present a parallel implementation of k-means locality sensitive hashing that is 20-25x faster than our baseline version for a desired level of accuracy. These results show that k-means locality-sensitive hashing can be fast and accurate on large, high-dimensional datasets.

DOWNLOAD TR-2125.pdf


Linhao Zhang(1). "Sentiment Analysis on Twitter with Stock Price and Significant Keyword Correlation". The University of Texas at Austin, Department of Computer Science(1). Report# HR-13-01 (honors theses). April 30th, 2013. 30 pages.

Though uninteresting individually, Twitter messages, or tweets, can provide an accurate reflection of public sentiment on when taken in aggregation. In this paper, we primarily examine the effectiveness of various machine learning techniques on providing a positive or negative sentiment on a tweet corpus. Additionally, we apply extracted twitter sentiment to accomplish two tasks. We first look for a correlation between twitter sentiment and stock prices. Secondly, we determine which words in tweets correlate to changes in stock prices by doing a post analysis of price change and tweets. We accomplish this by mining tweets using Twitter's search API and subsequently processing them for analysis. For the task of determining sentiment, we test the effectiveness of three machine learning techniques: Naive Bayes classification, Maximum Entropy classification, and Support Vector Machines. We discover that SVMs give the highest consistent accuracy through cross validation, but not by much. Additionally, we discuss various approaches in training these classifiers. We then apply our findings to on an intra-day market scale to find that there is very little direct correlation between stock prices and tweet sentiment on specifically an intra-day scale. Next, we improve on the keyword search approach by reverse correlating stock prices to individual words in tweets, finding, reasonably, that certain keywords are more correlated with changes in stock prices. Lastly, we discuss various challenges posed by looking at twitter for performing stock predictions.

DOWNLOAD TR-2124.pdf


Lauren Yew(1). "cOrcS: Continuation of Orc Security with Static Integrity Checking". The University of Texas at Austin, Department of Computer Science(1). Report# HR-12-11 (honors theses). December 13th, 2012. 25 pages.

Orc, short for "orchestration", is a functional programming language that focuses on solving concurrency problems using process calculus. cOrcS (continuation of Orc Security) is an extension of the Orc Programming Language which attempts to provide the functionality of a static integrity checking step for the Orc Language. Static integrity checking is an important component of securityóproviding a system to identify and preserve reliable data. This evaluation discusses the ability of a static integrity checking system to be integrated into the Orc programming language, and its potential capabilities after integration.

DOWNLOAD TR-2113.pdf


Benjamin Braun(1). "Compiling computations to constraints for verified computation". The University of Texas at Austin, Department of Computer Science(1). Report# HR-12-10 (honors theses). December 13th, 2012. 26 pages.

We present a compiler that automates the task of converting high-level code to constraint sets of the form accepted by the Ginger and Zaatar protocols for verified computation. Performing the conversion from high-level code to constraints by hand is prone to human error and therefore not practical for large computations. This paper evaluates the performance of the compiler and the effectiveness of its optimizations on reducing the size of the constraint set.We show that the compiler can produce constraint sets for a number of interesting computations, including DNA sequence alignment of 200 nucleotide sequences and partitionabout- medoids clustering of 100-dimensional data into two clusters.

DOWNLOAD TR-2112.pdf


Michael Levin(1), Stephen Boyles(1). "A Comparative Analysis of Heuristics for the Improved Convergence of Dynamic Traffic Assignment Models". The University of Texas at Austin, Department of Computer Science(1). Report# HR-12-09 (honors theses). December 13th, 2012. 58 pages.

Simulation-Based Dynamic Traffic Assignment (DTA) provides a high level of detail useful in realistic modeling for transportation planning. However, mathematical optimization of the solution algorithm is difficult because of discretized space and time, and as a result solution algorithm computation times measured in hours or even days can discourage use. This work compares and analyzes experimental results from two types of potential techniques for reducing computation time: a static traffic assignment (STA) warm start to DTA, and existing and novel heuristics from variations on the method of successive averages to gradient-based methods similar to those used in STA. Warm start results indicate the potential for increases in time and possibly solution convergence. Some DTA solution heuristics improve convergence significantly in a small network, suggesting the potential for benefit to larger networks. Further research may result in either technique becoming viable for practitioners.

DOWNLOAD TR-2111.pdf


Adrian Lopez-Mobilia(1). "Inverse Kinematics Kicking in the Humanoid RoboCup Simulation League". The University of Texas at Austin, Department of Computer Science(1). Report# HR-12-08 (honors theses). August 21st, 2012. 17 pages.

This paper describes the inverse kinematics kicking system used in UT Austin Villa's submission to the humanoid RoboCup simulation league competition. It describes the infrastructure we designed and how machine learning is used to optimize a set of kicks with the hope of developing a versatile and omni- directional kick. As of now, the ability to kick has not substantially affected the outcome of the 3D simulation competition as walk speed and dribbling have been the main focus and the strongest factors in a team's ability to do well competitively. The underlying goal is to create a set of kicks that can be optimized using machine learning to be faster to align but that can achieve similar distance and accuracy to fix keyframe based skills. To evaluate this goal we look to test the kick over many iterations against varied opponents and situations. We hypothesize that it is possible to create an abstracted representation of skills such as kicking through the use of inverse kinematics based curves that while simultaneously achieving the performance goals we pose and allowing a much simpler human interaction with the system.

DOWNLOAD TR-2094.pdf


Nona Sirakova(1), Kristen Grauman(1). "Body Pose As Indicator Of Human-Object Interaction". The University of Texas at Austin, Department of Computer Science(1). Report# HR-12-07 (honors theses). August 9th, 2012. 28 pages.

Human-object interaction recognition is important for detection of particular actions, small or partially occluded objects and video analysis. Our approach analyzes body poses and determines which ones are indicative of human-object interaction. We tested the method using videos containing different household actions. On average we obtained between 70 and 90 percent precision for the videos we tested on.

DOWNLOAD TR-2092.pdf


Michael Joseph McCoyd(1). "MinVisor: Provable Machine Protection with Optional Fidelity". The University of Texas at Austin, Department of Computer Science(1). Report# HR-12-06 (honors theses). June 5th, 2012. 78 pages.

The hypervisors securing our expanding cloud services are not as secure as we would like. Vulnerabilities in resource protection have allowed unprivileged guests to change arbitrary host data. The relatively smaller size of hypervisors does not seem to guarantee their safety. Yet formal methods of mechanical program verification have recently verified a 9,300 line operating system, thus it seems plausible to verify a small hypervisor.

DOWNLOAD TR-2086.pdf


Lucy Liang(1). "Active Learning of Image Ranking Over Relative Visual Attributes". The University of Texas at Austin, Department of Computer Science(1). Report# HR-12-05 (honors theses). May 5th, 2012. 68 pages.

Visual attributes are human-nameable, cross-categorical image concepts used by people everyday to describe objects. For instance, cats are "furry" while turtles are not. Some shoes are "shiny" while others are not. These human-intuitive attributes have significant uses in the computer vision field. Yet it is even more intuitive, and informative, to be able to ascribe the degree of some attribute in an image relative to that of others. If asked to describe a donkey, we are much less inclined to name all the attributes associated with the animal and are more likely to describe it in terms of a similar animal, using relative descriptions such as "like a horse but shorter and with longer ears." In this way, we can use relative attributes to build a richer vocabulary for textually describing an assortment of images. Currently, there exist rank learning methods that are useful for modeling relative visual attributes, but these methods generally require training samples in the form of relative partial orderings, labeled by humans. However, annotating training samples can be very time-consuming. In this work, we investigate three active learning methods for image ranking over relative visual attributes designed to minimize the annotation effort. The active learners seek to select training samples that provide the most useful information by choosing the most ambiguous image samples for annotation at each step in the learning loop. We accomplish this by using a low margin-based approach to sample selection. Two of our three active learners operate exclusively under this condition. For the final proposed active learner, we introduce a novel form of active sample selection that selects training samples that are both visually diverse and satisfy the low-margin property. Experimental results indicate that on average, the active learner employing a combination of the low-margin and visual-diversity property performs best, as it outperforms the other active learners that operate solely under the low-margin condition. In addition, all three active learners consistently outperform the baseline approach of random sample selection, supporting the effectiveness of our proposed active learning methods as applied to image ranking over relative visual attributes.

DOWNLOAD TR-2085.pdf


Joanna Smith(1). "Using Peer Review to Teach Software Testing". The University of Texas at Austin, Department of Computer Science(1). Report# HR-12-04 (honors theses). May 15th, 2012. 20 pages.

This paper explains how peer review can be used to teach software testing, an important skill that is typically not carefully taught in most programming courses. The goals of our peer review are (1) to frame testing as a fun and competitive activity, (2) to allow students to learn from each other, and (3) to demonstrate the importance of testing by uncovering latent bugs in the students' code. This paper describes our experience of introducing peer review to an honors data structures course that has a heavy programming load. We evaluate our intervention by reporting on student attitudes towards software testing throughout the course.

DOWNLOAD TR-2084.pdf


Bruno Li(1). "Physically based modeling of interactive plants". The University of Texas at Austin, Department of Computer Science(1). Report# HR-12-03 (honors theses). May 18th, 2012. 27 pages.

Producing realistic vegetation is an important challenge in interactive applications. Most modeling efforts have focused on realistic graphics, neglecting the distinctive soft-body motion of vegetation. The resulting trees approach photorealism, but stand still like statues, or at best exhibit an unconvincing swaying motion. This thesis studies an approach to tree animation using soft-body physics techniques, as well as a framework to automatically build such simulations from an L-system grammar. The resulting trees are interactive, exhibit realistic motion and are fast enough to be used in real-time. The physics simulation supports arbitrary external forces, and also allows for easy addition of game logic: as an example, rain and fire effects affecting trees were built on top of the simulation.

DOWNLOAD TR-2083.pdf


John Roesler(1). "Evaluating Topic Modeling as Preprocessing for a Sentiment Analysis Task". The University of Texas at Austin, Department of Computer Science(1). Report# HR-12-02 (honors theses). May 5th, 2012. 14 pages.

Classifying the sentiment of documents is a well-studied problem in Natural Language Processing (NLP). The existence of excellent discriminative classifiers like Maxent has pushed the main body of research in the direction of feature engineering. In this paper, I examine an unusual class of features, the document-topic proportions assigned by the Latent Dirichlet Allocation topic model. In particular, I evaluate the relative performance of topic proportions alone, topic proportions together with unigrams, and unigrams alone. My findings are that topic proportions do not add enough information to significantly improve classification performance. I also perform domain adaptation experiments, finding that (although topic proportions do not outperform unigram features) there is some promise in classification based on topic proportions.

DOWNLOAD TR-2080.pdf


Aidan Coyne(1). "Interpretable features for meaning in context". The University of Texas at Austin, Department of Computer Science(1). Report# HR-12-01 (honors theses). May 1st, 2012. 11 pages.

Recently, interpretable vector spaces have been introduced, vector spaces in which individual dimensions are meaningful. For polysemous words, interpretable vectors mix features from different senses. In this paper, we propose the task of computing context-specific interpretable representations for individual senses of a word. Focusing on computing features that describe circumstances of an event, we find that the task is feasible: The best vector contextualization model, component-wise multiplication, achieves positive ratings in 83% of all cases, compared to 58% for a baseline model.

DOWNLOAD TR-2079.pdf


Michael Miller(1). "PoComON: A POlicy-COMpliant Overlay Network". The University of Texas at Austin, Department of Computer Sciences(1). Report# HR-11-04 (honors theses). October 31st, 2011. 22 pages.

On todayís commodity internet, senders have no control over the path that their packets take beyond the first hop. This aspect of todayís in- ternet prevents many potential network policies from being implemented. Source routing has been proposed as a mechanism to solve this prob- lem, in which the sender chooses the path the traffic will take. How- ever, previous approaches have either not enforced policies specified by the sender, or have been prohibitively expensive in terms of computing power. icing-pvm addresses these concerns by efficiently implementing source-based routing in a manner which allows path preferences to be enforced. PoComON builds on top of icing-pvm, and provides a transi- tional overlay network to deploy icing-pvm on todayís internet supporting legacy applications.

DOWNLOAD TR-2054.pdf


Andy Luong(1). "Reconstructing a Fragmented Face from an Attacked Secure Identication Protocol". The University of Texas at Austin, Department of Computer Sciences(1). Report# HR-11-03 (honors theses). May 9th, 2011. 55 pages.

Secure facial identification systems compare an input face to a protected list of subjects. If this list were to be made public, there would be a severe privacy/confidentiality breach. A common approach to protect these lists of faces is to store a representation (descriptor or vector) of the face that is not directly mappable to its original form. In this thesis, we consider a recently developed secure identification system, Secure Computation of Face Identification (SCiFI), that utilizes an index-based facial vector to discretely compress the representation of a face image. A facial descriptor of this system does not allow for a complete reverse mapping. However, we show that if a malicious user is able to obtain a facial descriptor, it is possible that he/she can reconstruct an identifiable human face. We develop a novel approach to initially assemble the information given by the SCiFI protocol to create a fragmented face. This image has large missing regions due to SCiFI's facial representation. Thus, we estimate the missing regions of the face using an iterative Principal Component Analysis (PCA) technique. This is done by first building a face subspace based on a public set of human faces. Then, given the assembled image from the SCiFI protocol, we iteratively project this image in and out of the subspace to obtain a complete human face. Traditionally, PCA reconstruction techniques have been used to estimate very small or specific occluded regions of a face image; these techniques have also been used in facial recognition such as through a k-nearest neighbor approach. However, in our new method, we use it to synthesize 60% to 80% of a human face for facial identification. We explore novel methods of comparing images from different subspaces using metric learning and other forms of facial descriptors. We test our reconstruction with face identification tasks given to a human and a computer. Our results show that our reconstructions are consistently more informative than what is extracted from the SCiFI facial descriptor alone. In addition, these tasks show that our reconstructions are identifiable by humans and computers. The success of our approach implies that a malicious attacker could expose the faces on a registered database.

DOWNLOAD TR-2038.pdf


Matthew Christiaan de Wet(1). "Avoiding Premature Convergence in NeuroEvolution by Broadening the Evolutionary Search". The University of Texas at Austin, Department of Computer Sciences(1). Report# HR-11-02 (honors theses). March 9th, 2011. 63 pages.

This paper introduces a novel, general-purpose neuroevolution method which is named Broadened neuroEvolutionary Algorithm for Searching Topologies (BEAST). The goal of this algorithm is to solve premature convergence in evolution in dynamic problems or environments. Premature convergence is one of the major causes of failure in evolutionary artificial neural networks (EANNs), since it slows or halts evolution when local optima or changes to the objective function are present. BEAST combines several diff erent techniques to reduce the e ffects of convergence, each of which targets one or more of its major causes. The most radical of these techniques applies a meta-search to direct evolution from a higher level than previous approaches. This thesis tests the performance of BEAST against convergence in three major types of dynamic problems.

DOWNLOAD TR-2020.pdf


Adam Dziuk(1). "Creating Intellegent Agents through Shaping of Coevolution". The University of Texas at Austin, Department of Computer Sciences(1). Report# HR-11-01 (honors theses). February 9th, 2011. 16 pages.

Producing agents that are e ective in video game and video game-like environments is a di cult task for which powerful solutions are needed. One such solution, shaping, has been applied to standard evolutionary algorithms in the past in environments such as the NERO video game, and had considerable success. Another solution, coevolution has been shown to be a promising approach, but has been di cult to control. In this paper, the shaping approach is applied to coevolution in an attempt to enhance the already powerful e ects of coevolution in competetive environments. Several example automated shaping methods are introduced and tested in a simple capture-the-ag-like environment using the Open-NERO research platform and rtNEAT for neuroevolution. This paper demonstrates that each of these shaping methods is superior to a control, and further analyzes the results in order to determine general rules for shaping coevolution. While di erent methods of shaping apply di erently to di erent environments, using the approach detailed here, it should be possible to use coevolution to create intelligent agents in a variety of situation

DOWNLOAD TR-2015.pdf


Jeff Donahue(1). "Image Classifications with Annotator Rationales". The University of Texas at Austin, Department of Computer Sciences(1). Report# HR-10-10 (honors theses). December 3rd, 2010. 51 pages.

In image classification tasks, the traditional supervised learning approach is for annotators to simply label each image with its class name. We contend that this approach may waste potentially valuable information: the reasoning that went into the choice. This information can best be exploited in tasks where an element of subjectivity or per ception is involved in the annotation. Hence, in a new approach, we enrich this simple categorical annotation by augmenting it with a "rationale": a polygon drawn around the region(s) of the image that the annotator found most influential in his or her clas sification decision. This is distinct from foreground segmentation, as the entirety of the foreground may not have been influential to the annotator's decision. To make use of this extra information, when creating a representation for the features of an image in the training set, we give special treatment to those features that fall inside of a ratio nale polygon. We have tested our approach on a scene classification task, with results showing that this extra bit of information is highly useful in deciding whether an image belongs to certain scene categories. We have further applied our approach to the more subjective task of deciding whether a person in an image is attractive, and have seen promising preliminary results in this domain as well.

DOWNLOAD TR-2006.pdf


Matthew Johnston(1). "An Analysis of Distributed Decision Making Methodologies in Role Playing Video Games". The University of Texas at Austin, Department of Computer Sciences(1). Report# HR-10-09 (honors theses). May 20th, 2010. 13 pages.

Some environments or tasks may require multiple agents to work together to survive or thrive. Turn-based combat, found in many popular role-playing games, is one such environment. This paper examines a method to expose agents controlled by neural networks to both the solo and group cases in this domain, as well as difficulties presented by the transition from solo to group combat. The results suggest that, dependent on the diversity of agents' skills, agents may choose heterogeneous behaviors regardless of reward scheme.

DOWNLOAD TR-1975.pdf


Vincent Liu(1). "A Compiler Alternative to Expression Templates". The University of Texas at Austin, Department of Computer Sciences(1). Report# HR-10-08 (honors theses). May 20th, 2010. 48 pages.

Container classes in object-oriented languages often suffer from the problem of compiler-generated temporaries. Storage is allocated for each intermediate result in each expression, introducing huge penalties to performance. Loop fusion and array contraction, the most closely related compiler optimizations to this problem, are not powerful enough to optimize the container classes. The current workaround is to perform a manual optimization called Expression Templates (ETs), which utilizes C++ templates to perform arbitrary code generation, but is a very messy and inefficient way of doing things. We present a compiler alternative to ETs that uses programmer annotations to eliminate temporaries through expression fusion. Our solution is much better for programmer productivity than ETs, and also achieves results that are slightly better.

DOWNLOAD TR-1974.pdf


John Wright(1). "Membership Query Removal Implies Non-trivial Cryptographic Primitives". The University of Texas at Austin, Department of Computer Sciences(1). Report# HR-10-07 (honors theses). May 17th, 2010. 13 pages.

The extent to which membership queries help learners learn concepts has been well-studied for many years. In a result by Angluin and Kharitonov, it was that when trying to learn many sufficiently powerful concepts, learners are not benefited by membership queries. We investigate the converse of this statement, and are able to prove some weakened versions of the converse.

DOWNLOAD TR-1973.pdf


Timothy Nodine(1). "Speciation in NEAT". The University of Texas at Austin, Department of Computer Sciences(1). Report# HR-10-06 (honors theses). May 17th, 2010. 9 pages.

This paper investigates speciation in NEAT and its effect on NEATís performance. Speciation is the set of processes by which NEAT creates, maintains and uses several disjoint groups of similar genomes for guiding reproduction. This paper also examines the issue of adoption, where mating within a species produces an individual in a different species. It then examines the relationship between NEATís performance, the adoption rate and the number of species. The paper demonstrates that while species and adoptions are necessary, too many species and adoptions harm the algorithmís performance.

DOWNLOAD TR-1972.pdf


Martin Schatz(1). "Heuristic Approach to Organic Chemistry". The University of Texas at Austin, Department of Computer Sciences(1). Report# HR-10-05 (honors theses). May 12th, 2010. 17 pages.

An important question in organic chemistry is how does a chemist synthesize chemicals in the most efficient way beginning with chemicals readily available. These prob- lems are known as synthesis problems. Cur- rently, methods based on databases (Chen, 2006) and methods related to quantum mechanics (Jorgensen, 1998) are used fre- quently to solve this problem. When solv- ing synthesis problems by hand, a key step is to identify the functional groups of molecules as this gives the experimenter a set of applicable reactions. The functional groups required for different reactions are experimentally derived and accepted into a knowledge base. Instead of relying on this knowledge base for functional group iden- tification, perhaps there is a way to gain the same level of accuracy for functional group identification with large datasets of reactions. This paper investigates a heuris- tics method of determining the functional groups of a certain reaction type.

DOWNLOAD TR-1971.pdf


Kyle C. Hale(1). "Segment Gating for Static Energy Reduction with Introspective Networks-on-Chip". The University of Texas at Austin, Department of Computer Sciences(1). Report# HR-10-04 (honors theses). May 11th, 2010. 18 pages.

Chip multiprocessors (CMPs) have emerged as a primary vehicle for overcoming the limitations of uniprocessor scaling, with power constraints now representing a key factor of CMP design. Recent studies have shown that the on-chip interconnection network (NOC) can consume as much as 36% of overall chip power. To date, researchers have employed several techniques to reduce power consumption in the network, including the use of on/off links by means of power gating. However, many of these techniques only target dynamic power, and those that consider static power focus exclusively on flit buffers. This thesis aims to reduce static power consumption through a comprehensive approach that targets buffers, switches, arbitration units, and links. I establish an optimal power-down scheme which is used as an upper bound to evaluate several static policies on synthetic traffic patterns. I also evaluate dynamic utilization-aware power-down policies using traces from the PARSEC benchmark suite. Finally, I show that both static and dynamic policies can greatly reduce static energy at low injection rates with only minimal increases in dynamic energy and latency.

DOWNLOAD TR-1970.pdf


Christopher Bush(1). "An Analysis of Automated Decision Making Methodologies in Role Playing Video Games: Centralized Approach". The University of Texas at Austin, Department of Computer Sciences(1). Report# HR-10-03 (honors theses). May 11th, 2010. 18 pages.

This paper analyzes an approach at evolving intelligent agents to work together to perform and accomplish a common task in a role-playing video game environment. A platform was created that allows research into the capabilities of automated neuro-evolving agents in team (party) environments to work together to defeat a common enemy or perform a common task. This research focuses on a centralized methodology to accomplish this. All party members share a common agent, or brain. Party members are given a set of qualities that contain information that either a player or a computer enemy would need in a MUD or MMORPG combat situation. These qualities are then used by the centralized agent to make decisions for each party member to perform their tasks. This shared agent is then rewarded based on the performance of the party as a whole. This research looks into the effectiveness and limitations of that centralized agent in this environment.

DOWNLOAD TR-1969.pdf


David Robson(1). "Hierarchical Neural Networks for Behavior-Based Decision Making". The University of Texas at Austin, Department of Computer Sciences(1). Report# HR-10-02 (honors theses). May 10th, 2010. 16 pages.

This paper introduces the concept of Hierarchical Neural Networks as a viable strategy for behavior-based decision making in certain contexts. Hierarchical Neural Networks, or HNNs, refers in this case to a system in which multiple neural networks are connected in a manner similar to an acyclic graph. In this way, responsibility can be divided between each neural network in every layer simplifying the vector of inputs, of outputs, and the overall complexity of each network resulting in improved performance. This approach is shown to outperform a single neural network when both systems are tasked to learn to a survival strategy incorporating several behaviors in a real-time environment.

DOWNLOAD TR-1968.pdf


David Federman(1). "Applying Formal Concept Analysis to Cascading Style Sheets". The University of Texas at Austin, Department of Computer Sciences(1). Report# HR-10-01 (honors theses). May 10th, 2010. 12 pages.

Cascading Style Sheets (CSS) are used in the HyperText Markup Language (HTML) to describe the style, size, color, and position of elements in a document. While simple styles are easy to specify, a style sheet for a complex site can become many thousands of lines long. One problem in style sheets is that as they grow there is a tendency for increasing duplication of styles and properties, especially when the style sheet is used for many pages created by multiple authors. Formal Concept Analysis (FCA) is a technique for eliminating redundancy while identifying common concepts in a complex space of property denitions. In this work we use FCA to optimize style sheets to reduce redundancy, merge several rules together, and group the selectors with their declarations to express general formatting concepts. Two problems we solved include converting complex style sheets into a form on which FCA can be applied, and interpreting the resulting concept lattice to avoid introducing too many styles for small concepts. We evaluate the eectiveness of the solution on several style sheets.

DOWNLOAD TR-1967.pdf


Young-suk Lee(1). "Developing Scalable Quartet Tree Encodings". The University of Texas at Austin, Department of Computer Sciences(1). Report# HR-09-13 (honors theses). December 19th, 2009. 31 pages.

Reconstructing the Tree of Life, the evolutionary history of all species, stands as one of the most significant and intensive problems in computational biology. One approach to this grand project is to use supertree methods that merge a set of smaller trees (or source trees) into one single tree. In practice, most biologists use a particular supertree method called Matrix Representation with Parsimony (MRP) due to its topological accuracy as compared to most other methods. Recently, Snir and Rao presented a new supertree method that first encodes the source trees as a set of four-leaf trees and then uses Quartet Maxcut (QMC) on these quartet trees to compute a single overall tree. On certain realistic model conditions, this supertree method using a particular quartet encoding, Exp + TSQ, was shown to outperform MRP in terms of topological accuracy. However, this supertree method have many limitations. First, it fails to complete on many cases. Second, its subroutine Exp+TSQ is computationally intensive because it examines all possible quartets. These limitations discourage the use of QMC on Exp+TSQ. Thus, we extend the QMC study in the hope of designing a new scalable quartet encoding that would further improve this supertree estimation. Our quartet encodings are based on two ideas: the examination of all possible quartets on large trees is unnecessary, and the taxon sampling density of the source tree should be taken into account in the encoding. We propose an alternative time-efficient and robust encoding UniformK+TSQ* that may be used to substitute for Exp+TSQ without compromising the accuracy of the supertree method.

DOWNLOAD TR-1946.pdf


Shane Pope(1). "Port-Scanning Resistance in Tor Anonymity Network". The University of Texas at Austin, Department of Computer Sciences(1). Report# HR-09-12 (honors theses). December 10th, 2009. 15 pages.

The Onion Router (Tor) is an anonymity network that allows users to perform web browsing and other internet activities anonymously. A Tor user's traffic is routed securely through several other Tor relays before making its way to its destination. No one except the final Tor router in this circuit knows what the final destination of users traffic is and each router in the circuit only knows about the previous and next router. Tor users get the list of Tor IP addresses from a dedicated server which lists most of the Tor routers. With this list they can create random circuits through the internet to route their traffic. Governments that censor the internet with country-wide firewalls want to block Tor, because it allows users to circumvent the censorship. China has begun blocking Tor by downloading the list of all public Tor IP addresses and blocking them. [1] There are still options for internet users in China to access the Tor network. One option is unpublished Tor relays whose internet addresses are shared via email and instant messenger instead of in a public directory like normal Tor relays. Since these unpublished routers cannot be easily downloaded in bulk like the published Tor routers, detecting and blocking unpublished routers is the obvious next step for China and other censoring nations. Currently it is possible to detect these unpublished Tor relays by running Tor and attempting to connect to every internet address on ports Tor commonly runs. If a computer responds as Tor would, you know it is running Tor and can thus block the internet address. In this paper I present and implement a protocol which decreases the ease of detection of these unpublished relays, by hiding them behind a web server to prevent this type of scanning.

DOWNLOAD TR-1943.pdf


Chau M. Nguyen(1). "Constructing Drivability Maps From 3D Laser Range Data For Autonomous Vehicles". The University of Texas at Austin, Department of Computer Sciences(1). Report# HR-09-11 (honors theses). December 10th, 2009. 17 pages.

The field of robotics is making successful progress towards endowing robots with human-like capabilities. One example is the well-known DARPA Urban Challenge, which demonstrated an autonomous ground vehicle with sophisticated software that could drive itself while preparing for interaction with other vehicles, human beings, and obstacles on the road. In the recent DARPA Urban Challenge, road blockage locations were unknown to contestants but pre-specified routes were provided that included all accessible road segments and other information such as waypoints, stop sign locations, and lane width. When driving a car in real life, human beings may not have knowledge of drivable road segments in advance. Instead, they make decisions on how to drive based on what they see at the moment. Hence, it is necessary for the autonomous vehicles to have the same capability to recognize the drivable roads at their current positions. Murarka and Kuipers presented a real-time stereo vision based mapping algorithm for identifying and modeling various hazards in urban environments by constructing a 3D model and segmenting it into distinct traversable ground regions [1]. However, their algorithm works in real-time for the robot's immediate vicinity of 10 x 10 meters. This is a relatively small space compared to our vehicle's immediate vicinity of 120 x 120 meters. The goal of this thesis is to present a revision of their algorithm that will allow the vehicle to discover accessible road segments along its path without relying on pre-specified routes. This will be particularly important in off-road situations where accessible road segments may not be pre-defined. This algorithm is expected to significantly improve the vehicle's performance in detecting and avoiding obstacles as well as navigating safely to the destination.

DOWNLOAD TR-1942.pdf


Brandon Plost(1). "Protein Aggregation Optimization: An Algorithmic Approach". The University of Texas at Austin, Department of Computer Sciences(1). Report# HR-09-10 (honors theses). December 10th, 2009. 28 pages.

Parkinson's, Alzheimer's, and Huntington's Diseases are primarily caused by a biological phenomenon called protein aggregation. Protein aggregation occurs when multiple proteins stick together, which renders the proteins useless and also blocks other proteins from performing their job. If we can understand how to prevent protein aggregation, we can gain insight into how to treat related diseases. Since mutations in proteins can change the probability of that protein to aggregate, we want to find the set of substitutions that gives the lowest such probability. It is infeasible to attempt to optimize the aggregation propensity with mutations in a laboratory because it could take up to six months to make one mutant, so computational optimization is the only alternative. Thus, we think of a protein as a string with a specific alphabet, and we think of the probability that aggregation occurs as a real-numbered score associated with a given string. Different character substitution in the string (mutations in the protein) can change the aggregation score. We present an algorithm that lowers aggregation score under biological constraints; the algorithm limits outputs to those which have the same biological function as the input string, minimize the Hamming distance of the input and output string, and minimize running time. Our implementation of the algorithm minimizes the aggregation score of the input string by an average of 40 percent on a set of over 30 test strings that come from the yeast organism S. cerevisiae. To achieve this result, we experimented with optimizing pieces of the algorithm which have an effect on the output: threshold variables which determine equivalence classes, equations which choose one string over another and which determine internal state changes, and the maximum number of steps allowed. However we have not only created a procedure that can lower aggregation probability, but a procedure that can be generalized to a broader set of optimization problems beyond protein aggregation.

DOWNLOAD TR-1941.pdf

Wide character in print at /u/www/apps/tech_reports/ncstrl/ line 24, line 68133.

Tarun Nimmagadda(1). "Building an Autonomous Ground Traffic System". The University of Texas at Austin, Department of Computer Sciences(1). Report# HR-09-09 (honors theses). August 10th, 2009. 27 pages.

Ground traffic systems are the combination of ground vehicles, roads, highways and intersection infrastructure that make it possible for people to commute on the road. Recent advances in the Ô¨Āeld of robotics allow us to increase the autonomy of currently human-operated ground traffic systems. This document presents the key software components developed as a part of this thesis research to create an autonomous vehicle platform; speciÔ¨Ācally, a lane-based obstacle tracking system which enabled our autonomous vehicle, Marvin, to successfully handle dynamic traffic. Furthermore, the Autonomous Intersection Management (AIM) protocol designed by Dresner and Stone [7] shows that intersection efficiency can be signiÔ¨Ācantly increased if the intersection can communicate wirelessly with autonomous vehicles. Towards this end, this document presents the extensions made to Marvin¬ís capabilities so that it would communicate with the AIM system thereby bringing us closer to an Autonomous Ground Traffic System.

DOWNLOAD hr09-09.pdf


Jose Falcon(1), William R. Cook(1). "Gel: A Generic Extensible Language (Fall 2008)". The University of Texas at Austin, Department of Computer Sciences(1). Report# HR-09-08 (honors theses). May 27th, 2009. 20 pages.

Abstract. Both XML and Lisp have demonstrated the utility of generic syntax for expressing tree-structured data. But generic languages do not provide the syntactic richness of custom languages. Generic Extensible Language (Gel) is a rich generic syntax that embodies many of the common syntactic conventions for operators, grouping and lists in widely-used languages. Prefix/infix operators are disambiguated by white-space, so that documents which violate common white-space conventions will not necessarily parse correctly with Gel.With some character replacements and adjusting for mismatch in operator precedence, Gel can extract meaningful structure from typical files in many languages, including Java, Cascading Style Sheets, Smalltalk, and ANTLR grammars. This evaluation shows the expressive power of Gel, not that Gel can be used as a parser for existing languages. Gel is intended to serve as a generic language for creating composable domain-specific languages.

DOWNLOAD hr09-08.pdf


Josh Gunter(1). "On Avoidable Two Elements Sets of Partial Words (Fall 2008)". The University of Texas at Austin, Department of Computer Sciences(1). Report# HR-09-07 (honors theses). May 26th, 2009. 30 pages.




Mark Reitblatt(1). "A System for the Formal Verification of LabVIEW/G Diagrams (Spring 2009)". The University of Texas at Austin, Department of Computer Sciences(1). Report# HR-09-06 (honors theses). May 20th, 2009. 25 pages.

The purpose of this thesis is to present a prototype system for the veri cation of LabVIEW programs using ACL2. LabVIEW is a graphical data- ow programming language commonly used in data acqui- sition and control application. ACL2 is a programming language, formal logic and theorem prover that has seen broad use in the veri cation of industrial hardware and software systems. I will present an ACL2 model of LabVIEW programs, a system for translating LabVIEW programs into their ACL2 models, a method for annotating LabVIEW programs with speci cations, a methodology for the veri cation of the ACL2 model of a LabVIEW program with respect to its speci cation and a library of theorems about our ACL2 models of LabVIEW primitives designed to increase the ease and accessibility of veri cation. I will also present several small examples of veri ed LabVIEW programs to demonstrate the application of the system.

DOWNLOAD hr09-06.pdf


Yonatan Bisk(1). "The Necessity of Separating Control and Logic When Grounding Language Using Neuroevolution (Spring 2009)". The University of Texas at Austin, Department of Computer Sciences(1). Report# HR-09-05 (honors theses). May 19th, 2009. 36 pages.

In this research we analyze the task of evolving a neural network to understand simple English commands. By understand we mean that the final agent will perform tasks and interact with objects in its world as instructed by the experimenter. The lexicon and grammar are kept small in this experiment. This type of work where semantics are based on an agentís perceptions and actions is referred to as language grounding. A substantial literature exists in this area, as the ability to have a robot or computer understand natural language instructions is one major goal of Artificial Intelligence, but very little has been done that grounds language in actions. We discover that grounding in actions requires a separation of logic and motor control.

DOWNLOAD hr09-05.pdf


Nathaniel Tucker(1). "Evolving Adaptive Intelligence: Using NeuroEvolution with Temporal Difference Methods in the Game Domain (Spring 2009)". The University of Texas at Austin, Department of Computer Sciences(1). Report# HR-09-04 (honors theses). May 19th, 2009. 20 pages.

Adaptive intelligence is the ability of a system to respond quickly and effectively to changes in its environment during its lifetime. Responding quickly to change can be greatly beneficial in many domains, where unpredictable changes require generalizing adaptive behavior. Although there are many online reinforcement learning systems, manually selecting the proper representation can be challenging, and often leads to suboptimal solutions. By using evolution to automate the search for an appropriate representation, we are able to evolve agents better able to learn. We will examine the applicability of this concept by using real time NeuroEvolving Augmenting Topologies (rtNEAT) Ė an evolutionary algorithm for exploring the topological and synaptic weight space of neural networks Ė to evolve a function approximator for the Q-learning temporal difference method using an adaptive learning rate to more effectively adjust to dynamic conditions. This algorithm, real time NeuroEvolving Augmenting Topologies + Q-learning with Adaptive Learning Rate (rtNEAT+QwALR) will be compared in the dangerous foraging domain with adaptive assignments against rtNEAT, and rtNEAT+Q.

DOWNLOAD hr09-04.pdf

Wide character in print at /u/www/apps/tech_reports/ncstrl/ line 24, line 68133.

Brandon Blakeley(1). "Graph Algorithms for Multicores with Multilevel Caches (Spring 2009)". The University of Texas at Austin, Department of Computer Sciences(1). Report# HR-09-03 (honors theses). May 19th, 2009. 27 pages.

Historically, the primary model of computation employed in the design and analysis of algorithms has been the sequential RAM model. However, recent developments in computer architecture have reduced the efficacy of the sequential RAM model for algorithmic development. In response, theoretical computer scientists have developed models of computation which better reÔ¨āect these modern architectures. In this project, we consider a variety of graph problems on parallel, cache-efficient, and multicore models of computation. We introduce each model by deÔ¨Āning the analysis of algorithms on these models. Then, for each model, we present current results for the problems of preÔ¨Āx sums, list ranking, various tree problems, connected components, and minimum spanning tree. Finally, we present our novel results, which include the multicore oblivious extension of current results on a private cache multicore model to a more general multilevel multicore model.

DOWNLOAD hr09-03.pdf


Adam Setapen(1). "Exploiting Human Motor Skills for Training Bipedal Robots (Spring 2009)". The University of Texas at Austin, Department of Computer Sciences(1). Report# HR-09-02 (honors theses). May 19th, 2009. 24 pages.

Although machine learning, reinforcement learning, and learning from demonstration have im- proved the rate and accuracy at which robots can gain intelligence from humans, they haven't reached the rapid rate at which humans are able to acquire new knowledge. Many systems that exploit im- itation learning use simple positive and negative reinforcement, and place the burden of learning completely on the computer. This neglects the expressive capabilities of humans, as well as their remarkable ability to quickly re ne motor skills. While passive dynamics o ers the most human-like locomotion for bipedal robots, it also relies on particular design speci cations. This thesis presents a general Framework for Interactive Control of a Humanoid by Motion Capture (FICHMC), that o ers rapid motion development for large classes of bipedal robots. Essentially, a human in a motion- capture laboratory \puppets" a biped, with a real-time mapping from human to robot. The training process requires no technical knowledge and provides a natural interface for humans to directly transfer skills to robots.

DOWNLOAD hr09-02.pdf


Ari N. Schulman(1). "ModFetch: A Modular System for Automating Queries to Relational Databases (Spring 2009)". The University of Texas at Austin, Department of Computer Sciences(1). Report# HR-09-01 (honors theses). May 19th, 2009. 19 pages.


DOWNLOAD hr09-01.pdf


Trevor Fountain. "FDL: A Feature Description Language for Semantic Role Labeling". The University of Texas at Austin, Department of Computer Sciences. Report# HR-08-01 (honors thesis). Spring 2008. 13 pages.

In this paper I describe a language for extracting features from parse trees, especially those used in semantic role labelling (SRL). The language, which I call FDL, is designed to be accessible to nonprogrammers yet powerful enough to express most features found in SRL literature. I argue that such a language is useful both as an engineering platform and as an educational tool. I present the syntax for FDL and describe its implementation, and evaluate its usefulness by investigating how well it expresses novel features used in the CoNLL 2005 shared task.

DOWNLOAD cs-08-01-fountain.pdf


Stephen W. Kent. "Dynamic Error Remediation: A Case Study with Null Pointer Exceptions". The University of Texas at Austin, Department of Computer Sciences. Report# HR-08-02 (honors thesis). Spring 2008. 22 pages.

Managed programming languages have improved the robustness of software by reducing some classes of errors. Despite these improvements, programs still contain errors. Even with help from existing static and dynamic analysis tools, the errors are often difficult to detect and can be even more difficult to successfully eliminate. This work introduces a new approach, called dynamic error remediation, that further improves the robustness of software, by tolerating some null pointer exceptions until a patch can be created to fix the exception. This paper describes dynamic error remediation, and its effectiveness, with different strategies for handling the exceptions. We show that dynamic error remediation is successful in allowing programs to continue execution and to do meaningful work in many cases. We then describe the bug suite created to test dynamic error remediation, as well as providing a methodology for creating other bug suites. Finally, we describe origin tracking, a JVM modification which exposes the origin of null values that cause null pointer exceptions. Because dynamic error remediation is successful in many cases, we assert that it is a viable strategy for increasing availability and reliability in many programs. While increasing availability over correctness is not ideal in all cases, we show that there are real examples where it is preferred, and we provide dynamic error remediation as a tool for developers and users to survive null pointer exception bugs.

DOWNLOAD cs-08-02-kent.pdf


Adrian Mayorga. "A Combined Approach to Subsurface Scattering-Draft". The University of Texas at Austin, Department of Computer Sciences. Report# HR-08-03 (honors thesis). Spring 2008. 18 pages.

Current techniques for rendering multilayered translucent materials such as skin, marble, jade, and paper can achieve startling realism by modeling subsurface scattering. However, due to perfor- mance concerns, all of these techniques require xing the material properties, including the depth and scattering parameters of each layer, before rendering begins. This makes it impractical to calcu- late scattering pro les on a per pixel basis to render objects that have spatially varying parameters. We introduce a new method that combines ideas from two techniques to enable the rendering of multilayered materials with spatially varying parameters. Speci cally, we combine hierarchical in- tegration with a sum of Gaussians approximation of subsurface scattering di usion pro les. This speeds up the process of computing the interactions between individual layers by orders of mag- nitude, allowing parameters to vary across the surface. We show that this technique allows for rendering of translucent materials with spatially varying properties with relatively low overhead.

DOWNLOAD cs-08-03-mayorga.pdf


David Reaves. "Aesthetic Image Rating (AIR) Algorithm". The University of Texas at Austin, Department of Computer Sciences. Report# HR-08-04 (honors thesis). Spring 2008. 35 pages.

Rapidly advancing technologies o er a greater volume of people the possi- bility to both create and consume information. And, with this widening of opportunity, the volume of digital information has increased in mammoth proportion. Indeed, this age of information is marked by quantity, but what of quality? It has become necessary to formulate a systematic method to sift through the vast amount of data. This paper presents an algorithm that seeks to emulate the manner by which a human might judge an image's aesthetic value. The notion that a machine could imitate human thought processes is not necessarily novel, and, as such, a fair amount of work has been done regarding algorithmic aesthetic digital image rating. Most of these proposed algorithms, however, have been unable to satisfactorily mimic ac- tual human ratings. This paper builds on these past works and yet goes further by signi cantly improving on these prior accomplishments. The re- sult of our focus on the discovery of an optimal vector of image features is a highly accurate emulation of human ratings.

DOWNLOAD cs-08-04-reavesr.pdf


Jeffrey Rego. "Graphical Viewing of Relationships Extracted from Online Articles". The University of Texas at Austin, Department of Computer Sciences. Report# HR-08-05 (honors thesis). Spring 2008. 8 pages.

This paper discusses an approach to extracting and viewing relationships between named entities from news articles. It covers the preprocessing, parsing, extraction, filtering, and visualization of this information, starting from online news articles and ending with a visual graph of concepts they contain.

DOWNLOAD cs-08-05-rego.pdf


Mickey Ristroph. "A Component-Oriented Approach to Simultaneous Localization and Mapping". The University of Texas at Austin, Department of Computer Sciences. Report# HR-08-06 (honors thesis). Spring 2008. 14 pages.

The simultaneous localization and mapping (SLAM) problem is central to many mobile robots. The construction of a map of the local environment and the localization of the robot in that map must be accomplished in an incremental manner, even in the presence of signi cant error and uncertainty in sensor data. Further, the cyclic data dependency presents a challenge that does not lend itself to robust solutions. Solutions to the SLAM problem have been a major success in robotics research in the past 20 years. Existing solutions fully embrace the self-referential nature of the problem. There is a certain art to designing and tuning systems that achieve stability and conver- gence properties. Most robotics platforms implement a customized version of SLAM with modi cations to improve performance, given certain assumptions and a priori knowledge of the environment. Borrowing techniques and tools from the parallel composition research community, we aimed to design a robust, extensible, and efficient framework for SLAM solutions using a component- based architecture. This begins with a domain analysis, characterizing the breadth of existing solutions and factoring the logical function of various components in a modular way. We then define the interfaces for these components, provide implementations, and connect them in a data flow or dependency graph. The final implementation presented supports, in theory, particle filter localization and can benefit from automated parallelization. However, it has not yet run successfully at the time of writing. The reasons for this include missing key features and non-working implementations of purported features in the tools used, PCOM2 and CODE. These limitations are described in detail and motivated, and should serve as justification for future work in parallel component composition research. Unfortunately, these are the only presentable results of the research done here at this time.

DOWNLOAD cs-08-06-ristroph.pdf


Michael Romer. "Beamforming Adaptive Arrays with Graphics Processing Units". The University of Texas at Austin, Department of Computer Sciences. Report# HR-08-07 (honors thesis). Spring 2008. 12 pages.

Abstract—Beamforming is a signal processing technique by which an array of receivers sensitive to signals from all directions can be processed to form one larger more sensitive receiver that can identify which direction signals originate. Conventional beamforming methods can allow signals from noisy interferers to mask signals of interest if these interferers lie close to those directions to which the beamformer is sensitive. Adaptive beamforming (ABF) attempts to overcome this by minimizing the beamformer’s output subject to certain constraints. At its core ABF is an optimization problem, and a robust ABF procedure that consistently provides an optimal solution is computationally expensive. Nevertheless, ABF is of particular interest to the US Navy, where personnel trained to analyze acoustic data from sonar receivers can locate and track quiet targets of interest, e.g. submarines, that may be masked by sources of both ambient and directional noise in the ocean. ABF can be implemented to operate concurrently on sets of frequency-independent data, thus making ABF well-suited for parallel processing. Additionally, due to ABF’s high density of arithmetic operations, it is a suitable candidate for implementation on modern graphics processing units (GPUs). GPUs have been designed to quickly perform many concurrent arithmetic operations on large amounts of data. Furthermore, as of early 2007 they have reached a point at which they are not only capable of performing general-purpose tasks completely unrelated to graphics but also can be programmed to do such tasks far more easily and more naturally than has previously been possible. I show a method for parallelizing an existing serial ABF algorithm on an NVIDIA Geforce 8800 GTX, one of the first GPUs to use a generalized stream processor-based architecture. I take a single program, multiple data (SPMD) approach where the same software kernel executes over multiple blocks of frequencyindependent data in parallel. Further parallelism is exploited by subdividing each block into smaller subsets, independent of the direction the array is steered, and operating on these concurrently. Although initial results indicate that the GPU-based beamformer yields lower throughput than its serial counterpart, a number of possible optimizations are discussed that can allow the GPU implementation to match, if not exceed, the serial implementation.

DOWNLOAD cs-08-07-romer.pdf


Franziska Roesner. "Counting Dependence Predictors". The University of Texas at Austin, Department of Computer Sciences. Report# HR-08-08 (honors thesis). Spring 2008. 25 pages.

Modern processors rely on memory dependence prediction to execute load instructions as early as possible, speculating that they are not dependent on an earlier, unissued store. To date, the most sophisticated dependence predictors, such as Store Sets, have been tightly coupled to the fetch and execution streams, requiring global knowledge of the in-flight stream of stores to synchronize loads with specific stores. This thesis proposes a new dependence predictor design, called a Counting Dependence Predictor (CDP). The key feature of CDPs is that the prediction mechanism predicts some set of events for which a particular dynamic load should wait, which may include some number of matching stores. By waiting for local events only, this dependence predictor can work effectively in a distributed microarchitecture where centralized fetch and execution streams are infeasible or undesirable. I describe and evaluate a distributed Counting Dependence Predictor and protocol that achieves 92% of the performance of perfect memory disambiguation. It outperforms a load-wait table, similar to the Alpha 21264, by 11%. Idealized, centralized implementations of Store Sets and the Exclusive Collision Predictor, both of which would be difficult to implement in a distributed microarchitecture, achieve 97% and 94% of oracular performance, respectively.

DOWNLOAD cs-08-08-roesner.pdf


Juan F. Sequeda. "On Direct Mapping for Integrating SQL Databases with the Semantic web". The University of Texas at Austin, Department of Computer Sciences. Report# HR-08-09 (honors thesis). Spring 2008. 34 pages.

Relational databases are a critical source of Internet content. Therefore, the success of the Semantic Web hinges on enabling access to relational databases and their content by semantic methods. This paper surveys methods for the direct mapping of relational/SQL databases to the Semantic Web. Direct mapping means that the methods avoid making connections between a database and an existing domain ontology. Compared to wrapping methods that do integrate domain ontologies, an advantage of the direct mapping methods is that they are intrinsically completely automated. As an organizing principle we first suggest a correspondence between individual syntactic structures in SQL-DDL and the layers in the Semantic Web stack. This approach enables us to characterize each research effort with respect to the expressive hierarchy of the Semantic Web languages themselves. Building on the foundation of the prior work we define a direct mapping system, and show that it is complete with respect to all possible associations formed by combinations of primary-key and foreign-key constructs.

DOWNLOAD cs-08-09-sequeda.pdf


Brandon Streiff. "Bullseye: A System for Targeted Test Generation". The University of Texas at Austin, Department of Computer Sciences. Report# HR-08-10 (honors thesis). Spring 2008. 11 pages.

This paper describes the architecture and implementation of the Bullseye automatic test generation system. Bullseye differs from existing systems in that it is targeted: by using program analysis, it is able to guide testing to areas that influence or are influenced by areas deemed to be ‘interesting’: that is, locations that are more important to test. By not treating all paths equally and instead stressing these areas of influence, Bullseye guides branch selection to test only these areas. As a consequence, Bullseye deals with a manageable subset of program paths that still encapsulate relevant program behavior.

DOWNLOAD cs-08-10-streiff.pdf


Jiayun Chen. "Markov Logic Network for Information Extraction". The University of Texas at Austin, Department of Computer Sciences. Report# HR-08-11 (honors thesis). Spring 2008. 19 pages.


DOWNLOAD cs-08-11-chen.pdf


Chen, Mo. "Measuring and Improving the Performance of Cache-efficient Priority Queues in Dijkstra's Algorithm". The University of Texas at Austin, Department of Computer Sciences. Report# HR-07-36 (honors thesis). Spring 2007. 20 pages.

The priority queue is an useful data structure in computation. There currently exist many implementations of this data structure, including some that are cache-aware and some cache-oblivious. In this study, we compare the performance of several implementations of priority queues in Dijkstra's Single Source Shortest Path algorithm. We compare high per- formance heaps, such as the 4ary Aligned Heap, Fast Binary Heap and Sequence Heap against in-house heap implementations, namely Buffer Heap and Auxiliary Buffer Heap, and against well-known implementations such as the textbook Binary Heap. We focus our analysis on the benefit of supporting decrease-key operations within the priority queue. Results indicate that graph density a ect the relative performance of the different priority queues, and that using the Decrease-Key operation incurs unexpected performance hits. Furthermore, we will propose a parallel version of Buffer Heap which remains cache-oblivious and scales to a high level of parallelism.

DOWNLOAD cs-07-36-chen.pdf


Zacharski, Adam. "Xen and the Art of Distributed Virtual Machine Management". The University of Texas at Austin, Department of Computer Sciences. Report# HR-07-35 (honors thesis). Spring 2007. 24 pages.

In this thesis report, a method for automating the creation and management of many virtual machines across multiple physical machines is described. By using the Xen paravirtualization software as a base, this method adds tools to create new virtual machine images and allows for live migration without any human intervention. This method uses management agents that run on the physical and virtual machines. There is also a management GUI that can be run from any computer on the network. All of these agents (the management clients and the GUI) communicate using a publish/subscribe based messaging system. For example, when a machine is overloaded the server agent on that machine broadcasts a message to the publish/subscribe message bus that contains a request for help. This system was evaluated on a network of four dual processor dual core servers. Multiple high load virtual machines were started on a single physical machine. The system was then monitored to make sure that the virtual machines were evenly distributed among the other physical machines on the network. The test results show that this method for virtual machine management could be used for the management of a few hundred virtual machines.

DOWNLOAD cs-07-35-zacharski.pdf


Roche, David Lan. "Experimental Study of High Performance Priority Queues". The University of Texas at Austin, Department of Computer Sciences. Report# HR-07-34 (honors thesis). Spring 2007. 35 pages.

The priority queue is a very important and widely used data structure in computer science, with a variety of applications including Dijkstra’s Single Source Shortest Path algorithm on sparse graph types. This study presents the experimental results of a variety of priority queues. The focus of the experiments is to measure the speed and performance of highly specialized priority queues in out-of-core and memory intensive situations. The priority queues are run in-core on small input sizes as well as out-of-core using large input sizes and restricted memory. The experiments compare a variety of well-known priority queue implementations such as Binary Heap with highly specialized implementations, such as 4-ary Aligned Heap, Chowdhury and Ramachandran’s Auxiliary Buffer Heap, and Fast Binary Heap. The experiments include Cache-Aware as well as Cache-Oblivious priority queues. The results indicate that the high-performance priority queues easily outperform traditional implementations. Also, overall the Auxiliary Buffer Heap has the best performance among the priority queues considered in most in-core and out-of-core situations.

DOWNLOAD cs-07-34-roche.pdf


Moldenhauer, William. "Bringing Verification to a Virtual World". The University of Texas at Austin, Department of Computer Sciences. Report# HR-07-33 (honors thesis). Spring 2007. 29 pages.

Second Life is a virtual world where avatars representing real entities interact through scripted transactions. Avatars may be anonymous so that relationships from the real world do not carry into Second Life, and their trustworthiness may not be apparent. Our goal is to apply software verification techniques to establish a basis for trust in Second Life within the realm of scripted transactions. Even though Second Life is a virtual world, it has an economy that provides very real financial opportunity as well as risk. Scripts can be used to help manage assets of financial value in Second Life, but can they be trusted to handle financial assets appropriately? This project aims to provide a solution to this problem by automatically generating models of Second Life scripts that can be verified through the use of traditional software verification tools. This verification capability serves as an initial step towards a certification system for Second Life, providing a framework for formally verifying that Second Life scripts correctly implement their desired behaviors and certifying scripts with varying levels of trust, based on the properties they can be formally shown to satisfy.

DOWNLOAD cs-07-33-moldenhauer.pdf


Marker, Bryan. "On Composing Matrix Multiplication from Kernels". The University of Texas at Austin, Department of Computer Sciences. Report# HR-07-32 (honors thesis). Spring 2007. 21 pages.

Matrix multiplication is often treated as a basic unit of computation in terms of which other operations are implemented, yielding high performance. In this paper initial evidence is provided that there is a benefit gained when lower level kernels, from which matrix multiplication is composed, are exposed. In particular it is shown that matrix multiplication itself can be coded at a high level of abstraction as long as interfaces to such low-level kernels are exposed. Furthermore, it is shown that higher performance implementations of parallel matrix-multiplication can be achieved by exploiting access to these kernels. Experiments on Itanium2 and Pentium 4 servers support these insights.

DOWNLOAD cs-07-32-marker.pdf


Madigan, Ryan. "Creating a High-Level Control Module for an Autonomous Mobile Robot Operating in an Urban Environment". The University of Texas at Austin, Department of Computer Sciences. Report# HR-07-31 (honors thesis). Spring 2007. 15 pages.

The DARPA Urban Challenge is a government-sponsored competition scheduled for November 2007. The challenge is to create a fully autonomous vehicle built upon a street-legal frame that can navigate an urban driving environment. Naturally, urban driving environments require the vehicle to obey traffic laws, interact properly with other vehicles on the road, and generally drive much like a human would. This functionality requires a good amount of sophisticated decision making that was not necessary to complete the 2005 DARPA Grand Challenge. This paper outlines our strategy for implementing a control module to effectively maintain awareness of our vehicle’s highlevel situation and to ensure that the vehicle is working toward completing its mission while following the rules of the road.

DOWNLOAD cs-07-31-madigan.pdf


Shanmugam, Perumaal. "Hardware Accelerated Ambient Occlusion Techniques on GPUs". The University of Texas at Austin, Department of Computer Sciences. Report# HN-06-30 (honors). Fall, 2006. 10 pages.

We introduce an approximate yet visually pleasant ambient occlusion approximation running on real-time graphics hardware. Our method is a multi-pass algorithm that separates the ambient occlusion problem into high-frequency, detailed ambient occlusion and low-frequency, distant ambient occlusion domains, both capable of running independently and in parallel. The high-frequency detailed approach uses an image-space approach to approximate the ambient occlusion due to nearby occluders due to high surface detail. The low-frequency approach uses the intrinsic properties of a modern GPU to greatly reduce the search area for large and distant occluders with the help of a low-detail approximated version of the occluder geometry. Our current results show a promising trend in utilizing the highly parallel, stream processors (GPUs) to perform real-time visually pleasant ambient occlusion. We show that our ambient occlusion solution works on a wide variety of applications such as molecular data visualization, dynamic deformable animated models, highly detailed geometry. Our algorithm demonstrates scalability using a multi-GPU environment, and is Direct3D 10-aware in many respects, allowing for a smooth adaptation to the anticipated fundamental changes in upcoming graphics hardware.

DOWNLOAD cs-06-30-shanmugam.pdf


Le, Hai-Son. "Algorithms for Identication of Patterns in Biogeography and Median Alignment of Three Sequences in Bioinformatics". The University of Texas at Austin, Department of Computer Sciences. Report# HN-06-29 (honors). Fall, 2006. 25 pages.

Dynamic programming is a common technique used to solve variety of problems in many Computer Science elds. In bioinformatics, the technique nds wide usage due to the natural recursive structure of many computational problems. In my undergraduate research, I have looked at two different applications of dynamic programming in bioinformatics. 1. Tree comparison to infer common patterns in biogeography. Tree comparison to infer common patterns in biogeography. 2. Median alignment of three DNA, RNA or protein sequences with affine gap cost. My contribution to both of these problems have been to develop and apply techniques to improve the running time of the traditional algorithm. In the rst application, we have developed a sparsi ed dynamic programming algorithm that improves the running time from O(n2:5 log n) to O(n2) in most common cases. For the second application, I applied the cache-oblivious method for dynamic programs, developed by Rezaul Chowdhury and Vijaya Ramachandran to obtain highly I/O e cient algorithms. This thesis shows how two very di erent approaches could improve the performance of traditional dynamic programming algorithms and their applications in bioinformatics.

DOWNLOAD cs-06-29-le.pdf


Kret, Ehren A. "Sockets++: Safe and Secure Network Programming". The University of Texas at Austin, Department of Computer Sciences. Report# HN-06-28 (honors). Fall, 2006. 25 pages.

This research provides a more reliable and secure methodology and programming environment for developing network-enabled applications. The goal was achieved by designing and developing an Object-Oriented framework and APIs using C++. The contributions of this thesis are: 1. A new open source ANSI C++ standard Sockets++ iostreams framework and implementation that ensures that all network communication is not subject to common buffer overflow attacks. The framework utilizes robust iostreams buffering abstractions, consistency checking, and is compatible with OpenSSL, POSIX threads, and the C++ Standard Template Library. 2. A new open source ANSI C++ framework and implementation of the Tor onion-routing client that uses encryption and Sockets++ to enable application-level privacy and enhanced security when used with Sockets++. 3. A simple Web Proxy Server that is built from the previous components that enables additional privacy and anonymity when configured for use with standard web browsers. In particular, the proxy protects against snooping of a user's settings, etc.

DOWNLOAD cs-06-28-kret.pdf


Manohar, Ashu. "Testing a library of general-purpose action description". The University of Texas at Austin, Department of Computer Sciences. Report# HN-06-27 (honors). Summer, 2006. 23 pages.

Reasoning about actions and describing changes caused by the execution of these actions is an idea central to common sense reasoning. Action description languages have been developed to specify the effects and preconditions of actions using a logical framework. However, many of these formalisms suffer from a lack of generality and modularity [McCarthy, 1987]. A new Modular Action Description language (MAD) [Lifschitz & Ren, 2006] and a library of general purpose action descriptions [Erdogan, unpublished] written in MAD are being designed to solve these problems. This thesis is the first step towards testing the semantics of MAD and the ability of this library to succinctly and expediently describe new action domains.

DOWNLOAD cs-06-27-manohar.pdf


Youngblood, Carl. "A Component Based Approach to Automatic Pronoun Resolution over Varying Writing Stylesy". The University of Texas at Austin, Department of Computer Sciences. Report# HN-06-26 (honors). May, 2006. 21 pages.

Electronic media have made more information available than ever before. The result is that finding specific information can take longer than it should. Algorithms that can understand natural language could process an immense amount of written data into an organized system in a fraction of the time a person would take. The work reported here focuses on one specific aspect of the natural language processing task: the resolution of pronoun references. The goal of this project was: (1)To break apart the pronominal resolution process into a set of modules, each of which corresponds to one kind of information, extractable from the text containing the pronouns, that might help resolve which of the possible coreferents is correct. (2)To evaluate the effectiveness of those individual modules when working with different text genres and writing styles. The input to the pronoun resolution system consists of S-Trees built by Dan Bikel’s statistical parser. The pronoun resolution system is structured as a series of modules. Each module encodes a type of information that may or may not help determine the referent of each pronoun being considered. The modules can be activated or deactivated without altering the overall program functionality. The program was designed this way to isolate each module’s contribution to the pronoun referent resolution system. This allowed us to observe the contribution of the different information sources each module represents over a variety of text. We ran a series of experiments where texts with different properties were analyzed by the pronoun-resolution system. In each test we measured the effectiveness of the entire system as well as individual modules. We also attempted to determine which modules played the most significant roles in all the cases where the correct pronoun referent was found. The data obtained from these experiments show that the module(s) with the greatest contribution to pronoun resolution accuracy depended on the style of text under analysis. Further, the data suggest that the same module will have a large contribution to pronoun resolution accuracy on all texts with similar writing styles. Sometimes the implemented modules were unable to accurately resolve a majority of the pronouns in a text. The data do not answer the question, “What is required to accurately find the referents for the pronouns in this text?” It is possible that some texts are sufficiently complex that no single module will be able to resolve the majority of its pronouns. However it is also possible that more sophisticated syntactic and structural constraints or semantic augmentation could result in a module that would correctly find the majority of the correct coreferents in a text. The result is an experimental framework that organizes future work into finding more pairings between writing styles and the information sources that work best with them. One methodological issue is that a pronoun resolution system does not operate independently. It must interact with other standard natural language system components such as a part of speech tagger, a parser, and a lexicon. If one does not want to build every piece, it is possible to use off the shelf components. However this leads to a host of system integration issues since there is no agreement on a canonical set of part of speech tags, or on how a parse tree for a sentence should be structured.

DOWNLOAD cs-06-26-youngblood.pdf


Yang, Cheng-Fang. "Space-optimized Implementation of a Database Obfuscator for Group Privacy". The University of Texas at Austin, Department of Computer Sciences. Report# HN-06-25 (honors). May, 2006. 12 pages.

Privacy of data concerns data owners, customers, and marketers. Breaches in privacy populate headlines frequently, including one case where ID numbers and contact information of Hong Kong citizens who complained about the police in the past five years were exposed on the Internet [11]. The research commu- nity has proposed data privacy protection through statistical data randomiza- tion [1, 5], query auditing [10], secure multi-party computation [9], and private information retrieval [4]. We would like to protect the data differently by mak- ing imprecise queries to a database computationally infeasible to evaluate. In other words, we restrict allowed accesses to the database to those that should be computed quickly. We can do this through obfuscation, which effectively moves access control into the target database, allowing database owners to release the database knowing that only precise queries can be evaluated efficiently, depend- ing exponentially on the number of records that would satisfy the query. Thus, access becomes non-interactive–the owner or any trusted party does not have to oversee or approve every query submitted by the database user. Legal accesses are embodied by the owner-specified access/privacy policy, which describes the ideal functionality of the obfuscated database by specifying which queries can be satisfied efficiently. Ideal functionality is an abstraction that allows us to decide whether the obfuscator is secure [7]. We can imagine a trusted third party who provides the ideal functionality by filtering all accesses from a simulator. We will consider our construction secure if an attacker’s actions can be emulated by this simulator and trusted third party. Because our construction leaks meta information about the content of the database, we work with a weaker notion of obfuscation than a ”virtual black box” [2]. The construction is still useful because the leakage does not compromise the security of access control, which is our primary focus.

DOWNLOAD cs-06-25-yang.pdf


Whiteford, David. "Design of Standardized Representations for Animated Scenes for Ray Tracing". The University of Texas at Austin, Department of Computer Sciences. Report# HN-06-24 (honors). May, 2006. 20 pages.

In computer graphics, the process of creating an image generally consists of several steps: a scene file is created, the scene file is loaded into a graphics system, and finally the system creates the image in a process called rendering. The scene file stores information about the scene from which the image will be created. More specifically, a scene file generally consists of a collection of objects, information about where the objects are and how they look, lights to illuminate the scene, the viewpoint from which the scene is seen, and if the scene is animated, information about how everything in the scene changes over time. The file format used to represent the scene has a large impact on the system as a whole. If the format is not robust enough, then there will be some useful graphics effects that will be hard or impossible to represent within the file. If the format provides too many features, then it becomes harder to understand and integrate into a system. If the format does not organize the data it represents in a good way, then the system will have a hard time rendering the image in an efficient manner. Thus, a format suitable for the system being used and task being performed must be chosen. Two of the most commonly used methods of rendering are Z-buffering and ray tracing. Zbuffering is a method of approximating what a scene looks like, and is relatively computationally inexpensive. Ray tracing, on the other hand, is more realistic, but is much more computationally expensive compared to Z-buffering. Even though ray tracing is more realistic, it is not feasible to do in real-time on modern hardware, which has led Z-buffering to be the most widely used algorithm for many applications, especially those that are real-time. As such, most of the graphics hardware out nowadays is designed specifically for Z-buffering. As hardware in general and algorithms advance, however, ray tracing is becoming more feasible. To make the switch from Z-buffering to ray tracing happen, different things need to happen. For example, new, more advanced algorithms must be discovered and faster hardware, maybe even hardware targeted towards ray tracing, must be developed. Also of importance is designing a file format suitable for this task, which this paper will focus on. The goal of this paper, specifically, is to evaluate design alternatives for representing animated scenes to be used in ray tracers. First, existing graphics file formats that address related goals will be discussed and evaluated. Next, the exact requirements for a file format for animated scenes to be used in ray tracers will be assessed. Finally, a suitable file format, as well as thoughts on what would be required to implement support for the given format, will be presented.

DOWNLOAD cs-06-24-whiteford.pdf


Wang, Shan. "A First Step Towards Automated Knowledge Integration". The University of Texas at Austin, Department of Computer Sciences. Report# HN-06-23 (honors). May, 2006. 57 pages.

Knowledge integration is the process of incorporating new information into a body of existing knowledge (Murray and Porter, 1989). This process is a cycle of knowledge acquisition, formu- lation, recognition, elaboration, and adaptation. Knowledge acquisition is the act of gathering facts. These facts must be formulated into the language of the underlying representation of some knowledge base. Recognition is a preliminary identification of what piece of existing knowledge is most closely related to the new information. Elaboration makes inferences that are based on combining the new information with the existing knowledge. Adaptation involves moderating inconsistencies between the knowledge base Let’s look at each of these steps in the context of an example. A student in a biology class acquires knowledge through reading a chapter about muscles in a biology text book. He must then interpret syntactic and semantic constructs in each sentence perhaps to learn that each muscle fiber is composed of both thick and thin myofilaments. Here, he has to reformulate the knowledge to understand the content. He recognizes from previous learning that every muscle is composed of muscle fiber. Thus the information about myofilaments is an extension of what he already knew about muscle fibers. In this scenario, there were no conflicts caused by the new knowledge, and there was no need to adapt the existing model. However, in the general case, new information may reveal gaps in existing knowledge or even refute existing knowledge. This process can be repeated on a set of facts. As more knowledge is integrated into a system, it can make further inferences and align pieces of information which it previously had no basis to work with. Automating the process aims to create knowledge bases correctly and efficiently.

DOWNLOAD cs-06-23-wang.pdf


Ulrich, Jan. "An Analysis of the 2005 TAC SCM Finals". The University of Texas at Austin, Department of Computer Sciences. Report# HN-06-22 (honors). May, 2006. 10 pages.

Supply chains are the basis of the world economy. With a computer simulation of a supply chain management scenario, the Trading Agents Competition in Supply Chain Management (TAC SCM) promotes research in this field to try and automate the process. By letting computer agents optimize the manufacturing process, consumer goods become cheaper. Presented is an analysis of the 2005 Finals of the Trading Agent Competition in Supply Chain Management. Statistical methods identified the most important agent attributes to success. The different agents in the finals were then evaluated on those attributes.

DOWNLOAD cs-06-22-ulrich.pdf


Tong, Lingling. "Implementation and Experimental Evaluation of the Cache-oblivious Buffer Heap". The University of Texas at Austin, Department of Computer Sciences. Report# HN-06-21 (honors). May, 2006. 10 pages.

During the last decade CPU speed has increased, but speedup in memory access has not kept pace. Computer memory is divided into hierarchies with varying levels. A trade-off exists between small and fast memory versus large and slow memory. To simplify I/O analysis, memory can be thought of as divided into two levels where computation on data in main memory is fast and going to external memory is slow. With the emergence of applications that perform on large sets of data, the need for I/O efficient algorithms has risen. As an example, the online travel aid MapQuest provides directions for getting from destination A to destination B. Data can be imagined as a graph, with vertices representing cities and connecting streets representing edges. Dijkstra’s shortest path algorithm, which uses the priority queue data structure augmented with the decrease-key operation, can be used to compute the shortest path between A and B. Because of the large data size and the growing gap between CPU speed and memory access, efficient algorithms need to consider the non-trivial costs of I/O. This thesis takes one data structure, a cache-oblivious priority queue-buffer heap-and examines its practical running time. The goal is to see whether buffer heap is a viable alternative to binary heap when working with large datasets.

DOWNLOAD cs-06-21-tong.pdf


Silverman, Dustin. "The Immersion Interface Concept". The University of Texas at Austin, Department of Computer Sciences. Report# HN-06-20 (honors). May, 2006. 8 pages.

This paper presents a concept for organizing and interfacing with 3D windowing systems. Its intention is to provide a means of replacing the 2D desktop metaphor. The replacement metaphor is that of an office, a concept that more directly maps to 3D space.

DOWNLOAD cs-06-20-silverman.pdf


Resnick, Kevin. "Call Graph Correction Using Control Flow Constraints". The University of Texas at Austin, Department of Computer Sciences. Report# HN-06-19 (honors). May, 2006. 52 pages.

Dynamic optimizers for object-oriented languages collect a variety of profile data to drive optimization decisions. In particular, the dynamic call graph (DCG) informs key structural optimizations such as which methods to optimize and how to optimize them. Unfortunately, current low-overhead call-stack hardware and software sampling methods are subject to sampling bias, which loses accuracy of 40 to 50% when compared with a perfect call graph. This paper introduces DCG correction , a novel approach that uses static and dynamic control-flow graphs (CFGs) to improve DCG accuracy. We introduce the static frequency dominator (FDOM) relation, which extends the dominator relation on the CFG to capture relative execution frequencies and expose static constraints on DCG edges, which we use to correct DCG edge frequencies. Using conservation of flow principles, we further show how to use dynamic CFG basic block profiles to correct DCG edge frequencies intraprocedurally and interprocedurally.

DOWNLOAD cs-06-19-resnick.pdf


Nichols, Melissa. "Battery Life Sequence Alignment in Phylogenetic Reconstruction". The University of Texas at Austin, Department of Computer Sciences. Report# HN-06-18 (honors). May, 2006. 40 pages.

“Battery Life” is a short story exploring issues related to the ownership of artificially-intelligent beings. It seeks to address the following questions: Can a sentient being accept subservience? Can its loyalties be altered? Is the relationship slavery, or something more benign? Through Maya, the story’s android protagonist, “Battery Life” argues that the answers are not predetermined—that any body, imbued with intelligence and education, may choose a life in stark contrast to the one it was appointed.

DOWNLOAD cs-06-18-nichols.pdf


Mahdavi, Zack. "On Improving Heuristics for Maximum Likelihood and Multiple Sequence Alignment in Phylogenetic Reconstruction". The University of Texas at Austin, Department of Computer Sciences. Report# HN-06-17 (honors). May, 2006. 62 pages.

In the construction of phylogenetic trees, exact methods for maximum likelihood and multiple sequence alignment are very computationally intensive algorithms, even on a fixed tree. Since anything above cubic time is too computationally complex to be practical in phylogenetics, heuristics methods are used to approximate a solution. We attempt to optimize maximum likelihood heuristic methods to improve performance and accuracy by eliminating “fast evolving sites.” In addition, we compare various multiple sequence alignment heuristic methods to meet an overall goal of improving generalized tree alignment heuristic methods.

DOWNLOAD cs-06-17-li.pdf


Li, Bo. "Web Database (WDB): A Java Semantic Database". The University of Texas at Austin, Department of Computer Sciences. Report# HN-06-16 (honors). May, 2006. 11 pages.

With the widespread use of relational database systems such as MySQL and Oracle, the flaws of such systems become more apparent. While the relational model presents the database designer with a great degree of flexibility, it captures little meaning of the stored data and offers limited data integrity capabilities. This paper describes a database based on the semantic data model to capture the meaning of data so that the schema better represents its corresponding real world objects. A subset of the data definition and manipulation language is also explained. Furthermore, a detailed examination of implementing such a database management system in Java with the SleepyCat database engine is presented along with possible ways of utilizing dynamically compiling Java objects for data storage. By using the semantic data model, it is demonstrated how querying for entries in a hierarchic structure or with entity relationships is much simpler when compared to the equivalent SQL query.

DOWNLOAD cs-06-16-li.pdf


Issen, Laurel. "Using edge statistics for object recognition". The University of Texas at Austin, Department of Computer Sciences. Report# HN-06-15 (honors). May, 2006. 11 pages.

Many perception experiments have shown that object boundaries, defined by edges, contain a wealth of information for recognizing the object. This project is an attempt to construct an edge-based method of recognizing a simple object in a computer vision system. I chose to explore this problem within the domain of robotic soccer, where the visual system must identify the soccer ball among other robots, field features, and objects outside the field. The method presented in this paper uses only edge boundary statistics and performs better than chance at this task. I also outline future work that could improve accuracy in computer recognition of the soccer ball.

DOWNLOAD cs-06-15-issen.pdf


Ignatovich, Denis Andrey. "Quantitative Trading System". The University of Texas at Austin, Department of Computer Sciences. Report# HN-06-14 (honors). May, 2006. 20 pages.

My interests are in the study of market microstructures, that is, how trading takes place in the markets and how those markets are organized. Models are designed to describe aspects of these organizations and one needs flexible toolsets for model description and performance analysis. The current step in my research is an implementation of a quantitative trading system. Not only is this a challenging systems engineering project, but also a powerful mechanism for data analysis and trade algorithm description. Trading Model is defined as an investment tool comprised of buy and sell recommendations. These recommendations are much more complex than simple price change forecasts. They must be taylored to the investor in terms of his/her risk exposure, past trading history, and the market microstructure with its own constraints on the trade execution. A trading model has three main components: (1)Generation of the trading recommendations. (2)Accounting of the simulated transactions and their price impacts. (3)Generation of the model statistics by the performance calculator. A Trading System is, in turn, an environment where users define and, through execution feedback, adjust their trading models. An essential part of the system is the user-interface: it must be eloquent enough to allow an ease-of-use and, at the same time, powerful to describe most sophisticated trading algorithms. As in forecasting or other applications, trading models rely heavily on the quality of financial data. This constraint on the Trading System, a supply of the tick-by-tick data, is just an example of the multitude of requirements of a functional real-time trading environment. Thie following paper describes an implementation of a quantiative trading system designed to incorporate features representative of a commercial grade trading environment. The system that I propose includes programmatic access to the underlying execution framework, powerful Python-based algorithm description environment, real-time data support, and mathetmatical interfaces, including support for ARCH-based equity volatility models. This paper, in part, fulfills the degree requirement for Bachelor of Science in Computer Sciences (Turing Scholars Option).

DOWNLOAD cs-06-14-ignatovich.pdf


Gebhart, Mark. "Implementation of a Streaming Parallel Model for the TRIPS Architecture". The University of Texas at Austin, Department of Computer Sciences. Report# HN-06-13 (honors). May, 2006. 51 pages.

One common method of improving performance is to use a collection of processors to execute a large scale program in parallel. The Streaming Virtual Machine (SVM) model of computation is one method of extracting concurrency from traditional programs. In an SVM program tasks are partitioned into kernels that consume and produce streams of data rather than traditional operations that consume and produce single data values. This streaming model is particularly advantageous on architectures such as TRIPS that contain a Direct Memory Access (DMA) controller that allows for the overlapping of communication and computation. The author developed several parallel benchmarks along with enhancements to the TRIPS simulators to evaluate the viability of the SVM model on the TRIPS system. Initial result show that the SVM model holds promise in several domains and a prototype system of four processors achieves an average speedup of 3.70 times over uniprocessor sequential execution on a collection of computationally bound benchmarks.

DOWNLOAD cs-06-13-gebhart.pdf


Elhassan, Ikrima. "An Analysis Of GPU-based Interactive Raytracing". The University of Texas at Austin, Department of Computer Sciences. Report# HN-06-12 (honors). May, 2006. 8 pages.

Raytracing is a simple and elegant solution to many realistic image synthesis problems such as global illumination. Recent hardware advances have made GPU based raytracing possible. This paper explores the design space as well as the characteristics and limitations of interactive raytracing on said platform. We implement a GPU-based raytracer and compare it with a native as well as a modified CPU raytracer that mimics the GPU raytracer. We found that the GPU raytracer was almost always slower than the CPU raytracer, typically by a factor of 2x. The primary reason for the poor performance is the high cost of per pixel branching. This is illustrated in a pathological test case that causes high ray divergence. Additionally, the constraints of the architecture mandate inefficient modifications to the kd-tree traversal routine.

DOWNLOAD cs-06-12-elhassan.pdf


Elavarasan, Vasanth. "Email Filters that use Spammy Words Only". The University of Texas at Austin, Department of Computer Sciences. Report# HN-06-11 (honors). May 12, 2006. 32 pages.

There are several types of email filters that can be used to classify emails into spam emails or non-spam emails. All these filters use the occurrence of spammy words (those words that are typically found in spam emails) and non-spammy words (those words that are typically found in non-spam emails) to compute the probability that a given email is spam. Some spam emails imitate non-spam emails by including passages on an unrelated subject matter. To solve this problem, we design an email filter using only spammy words. The success of an email filter is highly dependent on which words are used as the spammy words. We go on to define a method to identify the optimal set of spammy words to use in our filter

DOWNLOAD cs-06-11-elavarasan.pdf


Brown, Daniel. "Exploring Universe Polymorphism in OmegaStatic Languages". The University of Texas at Austin, Department of Computer Sciences. Report# HN-06-10 (honors). May 6, 2006. 25 pages.

Omega extends Haskell with novel features for practical functional programming: GADT's, extensible kinds, and type functions. With both extensible types and extensible kinds in place, there is a tendency for redundant datatype definitions; likewise for functions that operate over these structures. Universe polymorphism is a way to abstract over levels in the typing hierarchy, unifying these redundant constructs. In this paper, we use Omega's novel features to encode simplified models to begin exploring the design space for universe polymorphism in Omega.

DOWNLOAD cs-06-10-brown.pdf


Boothe, Andy. "Closet: A Framework For Combining Scripting and Static Languages". The University of Texas at Austin, Department of Computer Sciences. Report# HN-06-09 (honors). May, 2006. 43 pages.

Scripting languages like Python and Ruby allow rapid development of software, but make no guarantees about the type safety of a program at compile time. While static languages like Java do make strong guarantees about the type safety of a program at compile time, they are not perceived to be as agile as their more dynamic counterparts. Is it possible to develop a statically-typed language that makes strong guarantees about type safety, but looks and acts like a scripting language?

DOWNLOAD cs-06-09-boothe.pdf


Zarnani, Hormoz. "An Incremental Approach to Verifying the Intentional Naming System Framework and Extensions". The University of Texas at Austin, Department of Computer Sciences. Report# TR-06-08 (honors thesis). December, 2005. 97 pages.

Lookup-Name is the name resolution algorithm for the Intentional Naming System (INS). We formalize this algorithm in the ACL2 logic and develop a specification for its correctness. We discuss certain difficulties that arise in our formalization of this algorithm which restrict our ability to prove its correctness. We describe a simpler version of the algorithm, specify it, and prove it correct. We discuss how the proof of the simple algorithm can be used as a guide to prove the original algorithm correct.

DOWNLOAD tr06-08.pdf


Richardson, Clare. "Rapid, High Precision Control in Tightly Constrained Environments". The University of Texas at Austin, Department of Computer Sciences. Report# TR-06-03 (honors thesis). December, 2005. 21 pages.

In autonomous robotics, environments in which there is little room for error in movement such as traveling through a doorway can be troublesome. Some control methods can only deal with these tasks by severely reducing the speed of the robot. However, combining both speed and precision, smooth movement around obstacles is possible. Many control methods focus on tasks Čin relatively open environments such as traveling down corridors to test their speed and obstacle avoidance capabilities. I have chosen two such control methods: the Vector Field Histogram approach by Borenstein and Koren and the Dynamic Window approach by Fox, Burgard, and Thrun. I evaluated their ability in tightly-constrained test environments using Vulcan, a robotic wheelchair, and implemented modifications to improve their usability in such environments.

DOWNLOAD tr06-03.pdf


Tran, Khoa D. "High Frequency Wave Propagation: a Convergent Approach, Part II: Implementation and Numerical Testing". The University of Texas at Austin, Department of Computer Sciences. Report# TR-06-02 (honors thesis). December 7, 2005. 25 pages.

Numerical approximation of wave propagation is prevalent in many applications. When the wave frequency is high, this is a computationally challenging problem and is a study of interest. In this paper, I present the implementation and numerical testing of the convergent approach proposed by Bruno et al. and retheorized by Tran to solve problems of electromagnetic or acoustic scattering by convex obstacles. The problem is formulated as an integral equation. For the evaluation of the integral operators, a localized integration scheme is used. This, combined with a change of variables to resolve the problem of high slopes at the shadow boundaries, provides an efficient algorithm to solve the scattering problem for Helmholtz equation at high frequencies, with a computational cost that is independent of the frequency for a fixed accuracy.

DOWNLOAD tr06-02.pdf


Williams, Paul Leighton. "Learning Visual Scene Descriptions: An Approach to Symbol Grounding". The University of Texas at Austin, Department of Computer Sciences. Report# TR-06-01 (honors thesis). December 2005. 36 pages.

Introducing a new architecture has never been a trouble-free subject for computer architects. Growing complexity and compatibility issues often hinder the progress of new architectures even though the technological trends demand rapid, wholesome changes. TRIPS, is one such novel architecture that targets the problems of wire delays, memory latencies, power consumption and saturated parallelism. TRIPS aims to be a scalable, malleable, dynamically adaptive and non-specialized architecture that supports diverse applications. This thesis analyzes the advantages and disadvantages of the TRIPS architecture and its prototype. Because this thesis is the first to evaluate the TRIPS architecture, it tries to establish a foundation based on which the evaluation and analysis of the TRIPS architecture can be carried out smoothly in the future. The goals of this thesis are three-fold. The thesis provides details of the tools developed to analyze the TRIPS architecture, reports the metrics that provide data on the effects of the features of the TRIPS architecture on the program output, and evaluates the efficiency of the TRIPS model by comparing it with the Alpha 21264 (RISC) machine for a set of common source programs.

DOWNLOAD tr06-01.pdf


Ryan Cornelius. "Improving Prescripted Agent Behavior with Neuroevolution". The University of Texas at Austin, Department of Computer Sciences. Report# HR-05-01 (honors thesis). March 01, 2005. 7 pages.


DOWNLOAD TR-2138.pdf


Hwang, Irvin and Peter Stone. "Discovering Conditions for Intermediate Reinforcement with Causal Models". The University of Texas at Austin, Department of Computer Sciences. Report# TR-05-28 (honors thesis). May 18, 2005. 18 pages.

Learning to perform a task in an environment with sparse feedback is a difficult problem. While several approaches for increasing feedback during learning have been taken, these methods suffer from the dependency on human knowledge and engineering to find good solutions. We propose using causal models to increase the amount of feedback that will improve learning. This approach does not require domain-specific human engineering because causal models can be constructed directly from the environment using empirical data. Preliminary experiments and results show causal models can be used to automatically discover conditions for applying intermediate feedback that accelerate learning.

DOWNLOAD tr05-28.pdf


Li, Nancy. "Parameterized Firewall Templates: A New Approach to Firewall Design". The University of Texas at Austin, Department of Computer Sciences. Report# TR-05-26 (honors thesis). May 23, 2005.




Gautam, Aashin. "Stamping Out Spam". The University of Texas at Austin, Department of Computer Sciences. Report# TR-04-19 (honors). May 10, 2004. 21 pages.

We describe a new service of spam control in this paper. This service starts by providing some k e-pennies to each user of a mail server enabling this user to send k emails. Every time an email is sent or received, the balance of e-pennies in the users account is decremented or incremented respectively. Extra e-pennies can be attained from participating e-banks, which allow users to buy or sell e-pennies.

DOWNLOAD tr04-19.pdf


Boyed, Dipak Chand. "Analysis of the TRIPS Architecture". The University of Texas at Austin, Department of Computer Sciences. Report# TR-04-16 (honors). May 2004. 41 pages.

Introducing a new architecture has never been a trouble-free subject for computer architects. Growing complexity and compatibility issues often hinder the progress of new architectures even though the technological trends demand rapid, wholesome changes. TRIPS, is one such novel architecture that targets the problems of wire delays, memory latencies, power consumption and saturated parallelism. TRIPS aims to be a scalable, malleable, dynamically adaptive and non-specialized architecture that supports diverse applications. This thesis analyzes the advantages and disadvantages of the TRIPS architecture and its prototype. Because this thesis is the first to evaluate the TRIPS architecture, it tries to establish a foundation based on which the evaluation and analysis of the TRIPS architecture can be carried out smoothly in the future. The goals of this thesis are three-fold. The thesis provides details of the tools developed to analyze the TRIPS architecture, reports the metrics that provide data on the effects of the features of the TRIPS architecture on the program output, and evaluates the efficiency of the TRIPS model by comparing it with the Alpha 21264 (RISC) machine for a set of common source programs.

DOWNLOAD tr04-16.pdf


Chen, Victor Yenwen. "Explicit Construction of Ramsey-type Graphs". The University of Texas at Austin, Department of Computer Sciences. Report# TR-04-14 (honors). May, 2004. 23 pages.

We survey some recent constructions of Ramsey-type graphs, large graphs with small clique and independent set sizes. Then continuing the works of Grolmusz, we construct matrices over the ring Z mod pq, where p and q are distinct primes, such that the diagonal entries are 0 modulo pq and the off diagonal entries are nonzero modulo pq. These matrices lead to a construction of Ramsey graphs. Our work simplifies Grolmusz's construction while still matching the best known constructive asymptotic bound.

DOWNLOAD tr04-14.pdf


Lin, Ellie. "Creation of a Fine Controlled Action for a Robot". The University of Texas at Austin, Department of Computer Sciences. Report# TR-03-55 (honors). Fall 2003. 13 pages.

A common problem facing roboticists is the creation of fine controlled actions for a robot that must be interspersed with a baseline motion. We define a fine controlled action to be one in which small errors can make the difference in the success or failure of the action. A baseline motion is one that is executed repeatedly over time, such as walking straight or remaining idle. We examine the considerations that can affect the success of a fine controlled action that transitions between baseline motions. We introduce a general technique for implementing a fine controlled action that transitions from and to a baseline motion using the example of pushing an elevator button. We implement this technique in the area of robotic soccer. Our results demonstrate that this technique can successfully create fine controlled actions for a robot.

DOWNLOAD tr03-55.pdf


Lin, Mei. "Congestion Control and an Analysis of TCP-friendly Rate Control". The University of Texas at Austin, Department of Computer Sciences. Report# TR-03-54 (honors). Fall 2003. 25 pages.

This paper presents a history of congestion control research and an analysis of a specific rate-based congestion control protocol, TCP-friendly rate control (TFRC). Two taxonomies of congestion control mechanisms are presented, and later used to classify general end-to-end congestion control schemes including window-based and rate-based methods. The classic TCP congestion control mechanisms are introduced as an instance of window-based congestion control. To address the research in congestion control for UDP flows, we discuss datagram congestion control protocol (DCCP) and the congestion manager (CM). Implemented in DCCP, TFRC is examined in detail for its TCP-friendliness and ability to provide smooth transmission for the applications. The sender\x92s and receiver\x92s protocols are presented using AP notation, and the equations used in the protocols are analyzed for their conduciveness to TCP-friendliness and smooth transmission. Lastly, two empirical studies of TFRC performance are summarized to validate the effectiveness of this protocol.

DOWNLOAD tr03-54.pdf


Miller, Lydia. "Security of the Grid Routing Protocol". The University of Texas at Austin, Department of Computer Sciences. Report# TR-03-53 (honors). December 2003. 41 pages.


DOWNLOAD tr03-53.pdf


Boyed, Saurabh. "A Semantic Database Management System: SIM". The University of Texas at Austin, Department of Computer Sciences. Report# TR-03-43 (honors). Spring 2003. 19 pages.

SIM is a database management system based on the semantic data model. The goal of this research project was the design and implementation of SIM. SIM, an abbreviation for Semantic Information Manager, uses a data model that thrives in capturing the meaning of the data more than other database models. Thus, this higher-level database model enables the database designer and the users to see a conceptual view of the database. This paper presents an overview of some of the benefits of the semantic data model and emphasizes how SIM incorporates the semantic data model. We describe how SIM overcomes some of the weaknesses of the other database modeling systems. Several SIM and SQL examples are also provided to illustrate this matter and serve as the basis for comparative analysis. We also discuss some implementation considerations and software tools used for design and implementation of SIM. Finally, we propose some hitherto unapplied uses and applications of SIM that have the potential to streamline current database systems.

DOWNLOAD tr03-43.pdf


Lai, Yung-Chuan (Sam). "Automation of Equational Proof Verification Using Haskell". The University of Texas at Austin, Department of Computer Sciences. Report# TR-03-34 (honors). August 2003. 19 pages.

Mistakes in a proof are strictly intolerable, because proofs are meant to assure the correctness of a theorem. Many proofs are still derived by humans today; consequently, they are error-prone. In order to minimize human errors, many rigorous proof styles have been developed, and equational logic is one of them. However, when writing an equational proof, it is still possible to make errors. Thus, it would be helpful to have software that checks proofs. This paper presents a logical model and a Haskell implementation for checking equational proofs in propositional logic. After reading the paper, the reader should be able to recognize the main obstacles to be overcome in building a proof verification tool in Haskell.

DOWNLOAD tr03-34.pdf


Bhagchandka, Dhiraj. "Classification of Firewalls and Proxies". The University of Texas at Austin, Department of Computer Sciences. Report# TR-03-28 (honors). Spring 2003. 47 pages.


DOWNLOAD tr03-28.pdf


Singh, Neha. "Analysis of Search Algorithms and Tree Structures for Proximity Search in Metric Spaces". The University of Texas at Austin, Department of Computer Sciences. Report# TR-03-25 (honors). Fall 2002. 18 pages.


DOWNLOAD tr03-25.pdf


Low, Tze Meng. "Generic Directed Acyclic Graphs". The University of Texas at Austin, Department of Computer Sciences. Report# TR-03-24 (honors). May 16, 2003. 11 pages.


DOWNLOAD tr03-24.pdf


John, Chand T. "Thoughts on Hybrid Systems". The University of Texas at Austin, Department of Computer Sciences. Report# TR-03-22 (honors). May 14, 2003. 6 pages.

There is a growing recognition of the presence of dynamical systems in many fields. In areas ranging from economics to physics and computer science to biology, many open fundamental problems are known to relate to the central problems in dynamical systems theory. Physicists generally work more with continuous dynamical systems, rather than discrete dynamical systems which computer scientists may study. The difference is that a continuous dynamical system is almost always defined as a system with an evolution function that satisfies a certain set of differential equations, while a discrete dynamical system is described by a state space involving discrete states and events which trigger discrete transitions between them. However, as computer science has come into contact with problems such as traffic management, automotive control, and modeling biological cell networks, which involve both continuous and discrete dynamical systems, a new formalism has emerged. This new formalism, the concept of a hybrid system, combines both the continuous and discrete formalisms into one entity that has fueled much research in recent years. Here we will formally introduce the notion of a hybrid system as an extension (or special case) of the concept of a dynamical system that is appropriate for many problems of interest in a wide variety of fields. We will then present summaries of some papers written on hybrid systems in recent years and present ideas for future research in the field.

DOWNLOAD tr03-22.pdf


Martajaya, Jeson. "Transparent Access to Remote Services". The University of Texas at Austin, Department of Computer Sciences. Report# TR-03-18 (honors). May 16, 2003. 31 pages.

The emerging trend of distributed computing gives birth to various mechanisms of providing access to remote services. These services may be in form of dynamic libraries or custom-tailored applications. This paper suggests a simple and transparent mechanism to access these services which requires no central point that allows it to be incorporated into peer-to-peer systems. This mechanism integrates discovery and invocation of the services into one solution. The solution is implemented as an application suite named Java Remote Dynamic Loader (JRDL). This

DOWNLOAD tr03-18.pdf


Fiduk, Kenneth W. "No Just Yet - Giving Objects Time to Die for Garbage Collection". The University of Texas at Austin, Department of Computer Sciences. Report# TR-03-16 (honors). May 8, 2003. 24 pages.


DOWNLOAD tr03-16.pdf


Sridhar, Srinath. "A Heap-Based Optimal Inversions-Sensitive Sorting Algorithm". The University of Texas at Austin, Department of Computer Sciences. Report# TR-03-15 (honors). April 18, 2003. 24 pages.

We introduce a heap-based sorting algorithm that makes n lg (I/n) + O(n lg lg (I/n) + n) comparisons and runs in time O(n lg (I/n) + n), where I is the number of inversions in the input sequence and n the number of items to be sorted. The coefficient 1 in the leading term for the number of comparisons matches the information-theoretic lower bound. The algorithm is simple and uses elementary data structures. This is the report of the undergraduate honors thesis of the author.

DOWNLOAD tr03-15.pdf


Heimlich, Marcel. "On Optimizing collective Communication". The University of Texas at Austin, Department of Computer Sciences. Report# TR-03-14 (honors). May 2003. 17 pages.

It has long been thought that research into collective communication algorithms on distributed memory parallel computers has been exhausted. This paper demonstrates that the implementations available as part of widely-used libraries are still suboptimal. We demonstrate this through the implementation of the "reduce-scatter" collective communication and comparison with the MPICH implementation of MPI. Performance on a large cluster is reported.

DOWNLOAD tr03-14.pdf


Timothy Andersen. "Neuroevolution through Augmenting Topologies Applied to Evolving Neural Networks to Play Othello". The University of Texas at Austin, Department of Computer Sciences. Report# HR-02-01 (honors theses). March 01, 2002. 18 pages.


DOWNLOAD TR-2139.pdf


Pereira, Kevin. "An Empirical Investigation into a Location Sensing System Based on RF Signal Strength of TinyOS Motes". The University of Texas at Austin, Department of Computer Sciences. Report# TR-02-66 (honors). Fall 2002. 14 pages.

With the increasing usage of handheld, mobile wireless devices and the unprecedented popularity of celluar phones, the ability to know where a mobile device is physically located at any given time is fast becoming an intrinsic feature of any wireless application and service. This paper will primarily deal with an investigation into the feasibility of using TinyOS motes to build a system in incremental stages that would be able to locate tagged objects within an acceptable boundary of error

DOWNLOAD tr02-66.pdf


Ahmed, Muqtadar. "The Anonymity Ring". The University of Texas at Austin, Department of Computer Sciences. Report# TR-02-65 (honors). Fall 2002. 38 pages.


DOWNLOAD tr02-65.pdf


He, Chun (Cindy). "Formal Specifications of Traceback Marking Protocols". The University of Texas at Austin, Department of Computer Sciences. Report# TR-02-42 (honors). May 2002. 21 pages.

Denial-of-Service attacks and Distributed Denial-of-Service attacks are serious security problems over the Internet due to their nature. They are easy to implement, hard to prevent, and very difficult to trace. This paper describes Denial-of-Service attacks and Distributed Denial-of-Service attacks and presents various traceback that are proposed to identify the sources of theses attackers in the Internet. Finally it gives formal specifications for a class of these traceback protocols called marking protocols.

DOWNLOAD tr02-42.pdf


Lee, Delwin F. "A Formal Presentation of Electronic Commerce Protocols". The University of Texas at Austin, Department of Computer Sciences. Report# TR-02-33 (honors). May 23, 2002. 55 pages.

In this paper, we investigate the security of microcommerce digital cash protocols that can facilitate the selling of content over the Internet. Examples of such content are web pages, newspaper articles, and even individual plays of computer games. We focus on two particular digital cash protocols, namely Compaq's Millicent and IBM's Micropayments. For each of these two protocols, we present a formal specification using the Abstract Protocol notation, and then discuss how an adversary can attack the protocol using message forgery, modification, and replay. We then use three concepts of convergence theory, namely closure, convergence, and protection, to show that each protocol is secure against these attacks. Finally, we formally specify and verify the Secure Sockets Layer protocol, which can be used to provide privacy for these digital cash protocols.

DOWNLOAD tr02-33.pdf


Middleton, Brittany E. "A Brief Introduction to Nonstandard Analysis". The University of Texas at Austin, Department of Computer Sciences. Report# TR-02-31 (honors). May 22, 2002. 38 pages.

ACL2(r), authored by Dr. Ruben A. Gamboa, is an extension of ACL2, to allow reasoning about the real and complex irrational numbers. The modifications are based on nonstandard analysis. Jun Sawada, of IBM Research Labs, required the support of ACL2(r) in his proof efforts to verify that the approximation used the square root function of the IBM Power4 processor had the accuracy required. To verify the accuracy of the square root function, Taylor's Theorem was chosen. Taylor's Theorem with Remainder is an approximation tool commonly used in mathematics that provides a means for estimating f(x) for an arbitrary x in the interval [ a , b ] from the values of f and its derivatives at a . This thesis serves as an introduction to the work completed in the proof of Taylor's Theorem with Remainder, as well as a general, concise introduction of nonstandard analysis in ACL2(r). The introduction is presented as a guide through the foundational lemmas needed in the proof of Taylor's Theorem with Remainder.

DOWNLOAD tr02-31.pdf


Noorani, Nizar. "Comparison of a Simulation System in Haskell vs. Java". The University of Texas at Austin, Department of Computer Sciences. Report# TR-01-60 (honors). December 20, 2001. 24 pages.

This paper examines the differences involved in implementing a program in an imperative language versus a pure functional language and discusses the advantages and disadvantages of these differences. In particular, this paper discusses the design of a simulation system and its implementation in the Object-Oriented language Java and the pure functional language Haskell. It will then examine the two implementations and make appropriate conclusions.

DOWNLOAD tr01-60.pdf


Shuja, Khawaja Usman. "Network Resource Management". The University of Texas at Austin, Department of Computer Sciences. Report# TR-01-57 (honors). March 2002. 10 pages.

This paper suggests a receiver-based scheduler to limit receiving data from other services. We implement Start-Time Fair Queuing in Active Names to schedule incoming data and examine its performance. The fundamental challenge lies in overcoming the difficulties to control receiving data. It is relatively easy to schedule sends, as the sender can control its buffer. But in receiving we have to limit reception from other sources to allow fair allocation to all the processes. Data reception can be controlled at three levels a) sender, b) router and c) receiver level. Under normal circumstances all these options can be employed but the problem arises when the sender is un trusted as it can consume bandwidth and cause denial of service attacks. Even if the intent of the sender is not to consume the bandwidth, the nature of the network traffic may cause it to be bursty. Unpredictable behavior of bursty senders makes the network resource management harder. We have built a receiver-based scheduler that distributes bandwidth fairly using Start Time Fair Queuing scheduler to schedule receiving data from steady senders and achieves up to 92% efficiency under bursty conditions.

DOWNLOAD tr01-57.pdf


Cheng, Pericles Leng. "SCHEDULING AND SECURITY FOR THE HYPERTEXT TRANSFER PROTOCOL". The University of Texas at Austin, Department of Computer Sciences. Report# TR-01-56 (honors). Spring 2001. 40 pages.

The Hypertext Transfer Protocol is used to facilitate the communication between "clients" (i.e. computers used to view information) and "servers" (i.e. computers used to store information). The communication between a client and server proceeds in transactions: the client sends a request message to the server and then the server sends back a reply that contains the requested information. In this thesis, we identify two problems associated with the Hypertext Transfer Protocol and propose solutions for them. The first problem is associated with pipelined HTTP request messages. Pipelining means that more than one request is sent to the server without waiting for a reply. If a connection failure occurs then the client sends all the requests again. Sometimes there is a possibility that the result or side effects of executing the same requests again may change. To eliminate this problem we need to check whether a request keeps the sequence idempotent, that is, the sequence can be executed again without changing the result or it's side effects. As a solution we introduce a look-ahead scheduler, which pipelines HTTP requests according to their idempotency. If a connection failure occurs then we know that the sequence send to the server again is idempotent and the result is always the same. The second problem is associated with the insecurity of HTTP cookies, a mechanism used to keep the state of a session. A session is a sequence of related request-reply transactions between a client and a group of servers. During a session, the server needs to maintain the current state of that session. For example, the server may need to maintain a list of all information that the client has requested so far in the session. The server can do this by utilizing cookies. A cookie is a small piece of information that is created by the server to contain the current state of the session. It is sent to the client in the reply and it is stored in the client's computer. When the client initiates the next transaction, the cookie that was stored in the computer is attached to the request before it is sent to the server. When the server receives the request, it uses the attached cookie to remember the current state of the session before it processes the request. (In an Internet store's shopping cart, for example, the server can save the current state of the shopping cart, i.e. items selected by the customer, in a cookie and store it in the client's computer. When the customer visits the store again at a later date, the cookie is sent to the server and his previously saved shopping cart is restored.) In recent years, the use of cookies became a problem since they were used in malevolent ways that jeopardized the privacy, integrity and anonymity that were offered to the individual users of the World Wide Web. The problem arises from the fact that cookies are stored on the client computer unencrypted and they are transferred over the Internet with no security. We analyze the current use of Internet cookies and the security problems that are related with them. Later we suggest ways in which Internet cookies can be implemented so that they are more secure.

DOWNLOAD tr01-56.pdf


Greer, Brian. "Numerical Optimization with Neuroevolution". The University of Texas at Austin, Department of Computer Sciences. Report# TR-01-49 (honors). December 2001. 10 pages.

Neuroevolution techniques have been successful in many sequential decision tasks such as robot control and game playing. This paper aims at establishing whether they can be useful in numerical optimization more generally, by comparing neuroevolution to Linear Programming in a manufacturing optimization domain. It turns out that neuroevolution can learn to compensate for uncertainty in the data and outperform Linear Programming when the number of variables in the problem is small and the required accuracy is low, but the current techniques do not (yet) provide an advantage in problems where many variables must be optimized with high accuracy.

DOWNLOAD tr01-49.pdf tr01-49.pdf


Ahmed, Nabeel. "An Efficient Disk Resource Management Mechanism for Scalable Disconnected Access to Web Services". The University of Texas at Austin, Department of Computer Sciences. Report# TR-01-48 (honors). January 2002. 11 pages.

This paper examines the importance of requiring a disk resource management mechanism for disconnected services and presents a robust system that embodies several features that are required of such a disconnected framework. Disconnected services in which clients access services in degraded mode (i.e without relying on network connectivity) are i mportant for providing greater service availability to potential clients. Because much of the content that requires connectivity is not cacheable, there is a trend towards downloading code that is "un-trusted", one that may place limitless demands on the resources available to the client. Although such resource management has a broad area of application, this paper takes a look at disk resource management for write buffering, where data are stored for disconnected services with the intent that they will be evacuated at a later time. In this paper we explore a per-service popularity algorithm to address the write buffering problem effectively. In doing so, we present a system that implements an 'automatic' disk resource management policy and examine how it performs relative to the more rudimentary techniques already in use. As a result, we discuss how our system provides greater service availability by allowing flexibility in the introduction of new services while also providing greater disk access to existing ones.

DOWNLOAD tr01-48.pdf


Beg, Mirza. "A Memory Accounting Interface for The Java Programming Language". The University of Texas at Austin, Department of Computer Sciences. Report# TR-01-40 (honors). October 2001. 8 pages.

Widespread use of the Internet infrastructure for deploying services creates new issues and raises serious concerns regarding the security of their execution environment. Ideas of employing dynamic distributed systems for mounting e-services on the web are gaining strength. The main idea behind their proposed design is the use of distributed extensions. This permits execution of un-trusted service code at clients, content distribution service machines or proxies, in order to make the dynamic services more effective. Over the past few years Java has surfaced as an attractive option for constructing web services and programming their execution environment. Java provides the capability of automatic memory operations but fails to provide an accounting interface. In order to make the services more secure the language needs a robust resource accounting interface. This paper discusses the design and implementation of a memory accounting interface as a key component of resource management. We discuss the design, implementataion and issues regarding the implementation of this system. To consider its practical application, we evaluate the performance and accuracy of this system.

DOWNLOAD tr01-40.pdf


Porter, George. "A Commuting Diagram Relating Threaded and Non-threaded JVM Models". The University of Texas at Austin, Department of Computer Sciences. Report# TR-01-27 (honors). July 2001. 68 pages.

We establish a commuting diagram that relates two models of the Java Virtual Machine (JVM). The first model, M3, supports much of Java, including classes, objects, and dynamic method resolution. The second model, M4, builds upon M3 by adding threads, monitors, and synchronized methods. We describe a theorem, Main, that asserts that running certain "single-threaded" states on M4 is equivalent to transforming those states to the domain of M3, running the transformed state there, and translating the result back to the domain of M4. We define the criteria we use to determine if the resulting states are equivalent, and we define our notion of "single-threaded". We then discuss a few lessons learned during the development of Main.

DOWNLOAD tr01-27.pdf


Klingner, Jeff. "Visualizing Sets of Evolutionary Trees". The University of Texas at Austin, Department of Computer Sciences. Report# TR-01-26 (honors). May 2001. 19 pages.

One of the problems with current methods for phylogenetic reconstruction is the large number of equally parsimonious trees that are often found during a tree search; understanding these large sets of trees is a challenge for biologists. I explored the utility of a data visualization technique in creating 2D and 3D images of tree sets in order to improve researchers' understanding of the sets. I used multidimensional scaling based on Robinson-Foulds inter-tree distances to construct the visualizations. Direct visualization of taxa differences was also explored. I found that structure in the tree sets was reflected in the visualizations. Visual clustering in our pictures corresponds to islands of phylogenetic trees in the set and also reveals additional structure. Direct visualization of taxa differences provides a good alternative to displaying divergence with branch lengths and may be useful in a divide-and-conquer approach to phylogenetic reconstruction. I integrated all of the computational steps needed for building the visualizations into a module for the phylogeny software package Mesquite.

DOWNLOAD tr01-26.pdf


Perez-Bergquist, Andres Santiago. "Applying ESP and Region Specialists to Neuro-Evolution for Go". The University of Texas at Austin, Department of Computer Sciences. Report# TR-01-24 (honors). May 2001. 23 pages.

Go is one notable board game where computer competence still trails behind that of human experts. In the past, neural-network-based approaches have shown promise. In this paper, the ESP variant of the SANE neuro-evolution algorithm was applied to go, and an alternate network architecture featuring subnetworks specialized for certain board regions was implemented. ESP produced simpler networks that performed just as well as the more complex ones produced by SANE in other studies. Having region-specialist subnetworks improved the already great performance marginally. However, both the simple network and the network with specialists failed to scale up to board sizes larger than 7x7.

DOWNLOAD tr01-24.pdf


Pacheco, Carlos. "Reasoning about TLA Actions". The University of Texas at Austin, Department of Computer Sciences. Report# TR-01-16 (honors). May 2001. 196 pages.

We use the ACL2 theorem prover to verify invariants of a distributed algorithm specified in TLA (Temporal Logic of Actions). The algorithm, Disk Synod, achieves consensus among a set of processors communicating through disks. We discuss the translation of TLA specifications into a finite set theory framework in ACL2, as well as the proof of two invariant properties of Disk Synod.

DOWNLOAD tr01-16.pdf


Chun-Chi Chen. "Automatic Music Composition using Genetic Algorithm and Neural Networks: A Constrained Evolution Approach". The University of Texas at Austin, Department of Computer Sciences. Report# HR-00-02 (honors theses). September 2000. 16 pages.


DOWNLOAD TR-2141.pdf


Chern Han Yong. "Cooperative Coevolution of Multi-Agent Systemsn". The University of Texas at Austin, Department of Computer Sciences. Report# HR-00-01 (honors theses). September 2000. 18 pages.


DOWNLOAD TR-2140.pdf


Todd Greer. "Evolving Hierarchical Neural Networks to Play Go". The University of Texas at Austin, Department of Computer Sciences. Report# HR-98-01 (honors theses). March 1998. 18 pages.


DOWNLOAD TR-2142.pdf


Rupert Tang. "A Connectionist Corpus-Based Approach to the Building of Word Representations". The University of Texas at Austin, Department of Computer Sciences. Report# HR-94-01 (honors theses). September 1994. 13 pages.


DOWNLOAD TR-2143.pdf


Watts, Jerrell. "Efficient Collective Communication on Multidimensional Meshes with Wormhole Routing". University of Texas at Austin, Department of Computer Sciences. Report# TR-94-19 (honors). June 1994. 45 pages.

This work presents a methodology for constructing efficient collective communication libraries for mesh-based multicomputers with wormhole routing. Such libraries can be built from a small set of simple primitives. Specifically, two classes of routines are presented: short vector routines that minimize message startups and long vector routines that minimize data transmission. Routines from these two classes can then be hybridized, resulting in algorithms which are efficient over a wide range of machine performance and vector size parameters. Implementations of these basic routines are given, and the derivation of other routines from them is demonstrated. Furthermore, these methods are applicable to meshes in their full generality: There are no restrictions on the relative dimensions of the mesh or the number of nodes contained therein (e.g., they do not require power-of-two partition sizes). Also, the schemes presented here inherently support group communications as well as full-partition operations. Performance data from the Intel Paragon are included.

DOWNLOAD tr94-19.pdf


James A. Bednar. "Multi-Level Neural Network Language Translator". University of Texas at Austin, Department of Computer Sciences. Report# HR-93-01 (honors theses).


DOWNLOAD TR-2144.pdf

For help please contact