USFDC Home  USF Electronic Theses and Dissertations   RSS 
Material Information
Subjects
Notes
Record Information

Full Text 
xml version 1.0 encoding UTF8 standalone no
record xmlns http:www.loc.govMARC21slim xmlns:xsi http:www.w3.org2001XMLSchemainstance xsi:schemaLocation http:www.loc.govstandardsmarcxmlschemaMARC21slim.xsd leader nam Ka controlfield tag 001 001498170 003 fts 006 med 007 cr mnuuuuuu 008 041209s2004 flua sbm s0000 eng d datafield ind1 8 ind2 024 subfield code a E14SFE0000515 035 (OCoLC)57715204 9 AJU6773 b SE SFE0000515 040 FHM c FHM 090 QA76 (ONLINE) 1 100 Aguilar, David P. 2 245 A radial basis neural network for the analysis of transportation data h [electronic resource] / by David P. Aguilar. 260 [Tampa, Fla.] : University of South Florida, 2004. 502 Thesis (M.S.C.S.)University of South Florida, 2004. 504 Includes bibliographical references. 516 Text (Electronic thesis) in PDF format. 538 System requirements: World Wide Web browser and PDF reader. Mode of access: World Wide Web. 500 Title from PDF of title page. Document formatted into pages; contains 82 pages. 520 ABSTRACT: This thesis describes the implementation of a Radial Basis Function (RBF) network to be used in predicting the effectiveness of various strategies for reducing the Vehicle Trip Rate (VTR) of a worksite. Three methods of learning were utilized in training the Gaussian hidden units of the network, those being a) output weight adjustment using the Delta rule b) adjustable reference vectors in conjunction with weight adjustment, and c) a combination of adjustable centers and adjustable sigma values for each RBF neuron with the Delta rule. The justification for utilizing each of the more advanced levels of training is provided using a series of tests and performance comparisons. The network architecture is then selected based upon a series of initial trials for an optimum number of hidden Radial Basis neurons. In a similar manner, the training time is determined after finding a maximum number of epochs during which there is a significant change in the SSE.The network was compared for effectiveness against each of the following methods of data analysis: forceentered regression, backward regression, forward regression, stepwise regression, and two types of backpropagation networks based upon the attributes discovered to be most predictive by these regression techniques. A comparison of the learning methods used on the Radial Basis network shows the third learning strategy to be the most efficient for training, yielding the lowest sum of squared errors (SSE) in the shortest number of training epochs. The result of comparing the RBF implementation against the other methods mentions shows the superiority of the Radial Basis method for predictive ability. 590 Adviser: Perez, Rafael. 653 learning optimization. radial basis function. regression analysis. back propagation. Gaussian units. 690 Dissertations, Academic z USF x Computer Science Masters. 773 t USF Electronic Theses and Dissertations. 4 0 856 u http://digital.lib.usf.edu/?e14.515 PAGE 1 A Radial Basis Neural Network for th e Analysis of Transportation Data by David P. Aguilar A thesis submitted in partial fulfilment of the requirements for the degree of Master of Science in Computer Science Department of Computer Science and Engineering College of Engineering University of South Florida Major Professor: Rafael Perez, Ph.D. Srinivas Katkoori, Ph.D. Dewey Rundus, Ph.D. Date of Approval: November 4, 2004 Keywords: radial basis function, learning optim ization, gaussian units, back propagation, regression analysis Copyright 2004 David P. Aguilar PAGE 2 i Table of Contents List of Tables iii List of Figures v Abstract vii Chapter 1. Introduction 1 1.1. Motivation for this Thesis 2 1.2. Contribution to the Discipline 3 1.3. Structure of Thesis Report 4 Chapter 2. Theoretical Foundations 5 2.1. The Basics of Neural Networks 5 2.2. The Radial Basis Function 8 2.3. Training Methods 13 2.3.1. Output Layer The Delta Rule 13 2.3.2. Hidden Layer 1 Movable Centers 14 2.3.3. Hidden Layer 3 Adjustable Sigmas 15 2.4. Training Algorithm 16 Chapter 3. Implementation and Training 18 3.1. Program Specifications 18 3.1.1. Program Overview 18 3.1.2. The File System 20 3.1.3. The Train Function 21 3.1.4. The Test Function 27 3.1.5. The Run Function 28 3.1.6. The Classificati on and Report Module 29 3.2. Implementation of Mathematical Formulae 31 3.3. Basic Functions A Delta Ru le Training (Fixed Centers) 33 3.4. Basic Functions B Delta Rule vs. Fixed Centers and Delta 35 3.5. Basic Functions C 37 3.5.1. Manually Adjusting the Sigma 38 3.5.2. Automatically Adjusting the Sigma 39 3.6. Analysis of Preliminary Training (OR, AND, XOR) 41 PAGE 3 ii Chapter 4. Application Initialization 42 4.1. Description of Data 42 4.2. CUTR Analysis Results 43 4.2.1. Forward Regression 43 4.2.2. Backward Regression 43 4.2.3. Stepwise Regression 44 4.2.4. Forceentered Regression 44 4.3. Selection of Network Architecture 48 4.3.1. Number and Selection of Inputs 48 4.3.2. Number of Neurons and Epochs 48 Chapter 5. Execution and Analysis 53 5.1. Training and Testing Results on Application Data 53 5.2. Analysis of Performance 56 Chapter 6. Performance Comparisons 58 6.1. Comparison with Regression Results 58 6.2. Comparison with Backpropagation Networks 59 Chapter 7. Conclusions 61 7.1. Final Analysis 61 7.2. Future Work 62 References 64 Appendices 66 Appendix A: Table of Incentive Va riables and Dependent Variable 67 Appendix B: Code Modules 68 Appendix C: Sample Files 71 PAGE 4 iii List of Tables Table 1. Level 1 Training Results During Network Testing 35 Table 2. Level 2 Training Results During Network Testing 37 Table 3. SSE and Accuracy After 300 Epochs For Level 1 Training 38 Table 4. SSE and Accuracy After 300 Epochs For Level 2 Training 38 Table 5. Level 3 Training Results During Network Testing 41 Table 6. Summary of Training Results on Binary Functions 41 Table 7. Classification of Data and The Bin Ranges 42 Table 8. ForceEntered Regression Results 45 Table 9. ForceEntered Network Results (0 Hidden Units) 45 Table 10. Backward Regression Results 45 Table 11. Backward Regression Netw ork Results (0 Hidden Units) 46 Table 12. Backward Network Results (With Hidden Units) 46 Table 13. Forward Regression Results 46 Table 14. Forward Regression Netw ork Results (0 Hidden Units) 46 Table 15. Forward Regression Network Results (With Hidden Units) 47 Table 16. Stepwise Regression Results 47 Table 17. Stepwise Regression Ne twork Results (0 Hidden Units) 47 Table 18. Stepwise Regression Netw ork Results (With Hidden Units) 47 Table 19. Stepwise Regression Network Results With Costs 48 PAGE 5 iv Table 20. Stepwise Results: Hi dden Units/Reduced Dataset 48 Table 21. Varying Percentage of Data For Neuron Number (Bin3) 50 Table 22. Varying The Number of Epochs For Classifying Bin 3 52 Table 23. Network Training Results 53 Table 24. Network Classification Results 54 Table 25. Error Analysis 55 Table 26. Comparison of RBF Network and Regression Results 58 Table 27. Comparison of RBF Netw ork and Regression Networks 1 59 Table 28. Comparison of RBF Netw ork and Regression Networks 2 59 Table 29. Table of Incentive Vari ables and Dependent Variable 67 PAGE 6 v List of Figures Figure 1. Linear Separability in 2D Space and 3D Space 6 Figure 2. A TwoLayer Network Trained For The XOR Function 7 Figure 3. Threshold and Sigmoid F unctions vs. Gaussian Function 9 Figure 4. TLU Decision Boundaries vs. Gaussian Boundaries 10 Figure 5. A Radial Basis Function (RBF) Neural Network 10 Figure 6. The XOR Problem in 2 and 3 Dimensional Space 12 Figure 7. The Main Menu of The Network Interface 19 Figure 8. Training Level 1 Adju sting OutputLayer Weights 22 Figure 9. Training Curve Di splaying The SSE Values 23 Figure 10. Network Diagram Output 24 Figure 11. Training Level 2 Ad justing Weights and Centers 25 Figure 12. Training Level 3 Adjusting Weights, Centers and Sigmas 26 Figure 13. Flowchart of The Test Function 27 Figure 14. Diagram of The Run Function Algorithm 29 Figure 15. SSE Curves For OR, AND and XOR Respectively (Set 1) 34 Figure 16. SSE Curves For OR, AND and XOR Respectively (Set 2) 36 Figure 17. SSE Curves For OR, AND and XOR Respectively (Set 3) 40 Figure 18. Diagram of Training Re sults with Variable Neurons 51 Figure 19. Effects of Training on SSE Values 54 PAGE 7 vi Figure 20. Effects of Training on Accuracy 54 Figure 21. Radial Basis Networ ks and Regression Analysis 58 Figure 22. Radial Basis Networks and Backpropagation Analysis 59 Figure 23. Radial Basis Networks / Hidden Backpropagation Analysis 59 PAGE 8 vii A Radial Basis Neural Network for th e Analysis of Transportation Data David P. Aguilar ABSTRACT This thesis describes the implemen tation of a Radial Basis Function (RBF) network to be used in predicting the effectiv eness of various strategies for reducing the Vehicle Trip Rate (VTR) of a worksite. Three methods of learning were utilized in training the Gaussian hidden units of th e network, those bein g a) output weight adjustment using the Delta rule b) adjustable reference vectors in c onjunction with weight adjustment, and c) a combination of adjustab le centers and adjustable sigma values for each RBF neuron with the Delta rule. The justif ication for utilizing each of the more advanced levels of training is provided using a series of tests and performance comparisons. The network architecture is then selected based upon a series of initial trials for an optimum number of hidden Radial Basis neurons. In a sim ilar manner, the training time is determined after finding a maximum numb er of epochs during which there is a significant change in the SSE. The network was compared for effectiveness against each of the following methods of data analysis: forceentered regression, backward regression, forward regression, stepwise regressi on, and two types of backpro pagation networks based upon the attributes discovered to be most pr edictive by these regression techniques. PAGE 9 viii A comparison of the learning methods us ed on the Radial Basis network shows the third learning strategy to be the most efficient for training, yielding the lowest sum of squared errors (SSE) in the shortest number of traini ng epochs. The result of comparing the RBF implementation against the other methods mentions shows the superiority of the Radial Basis method for predictive ability. PAGE 10 1 Chapter 1 Introduction According to the Air Resources Board established in 1967, the emission of cars and trucks is the single most si gnificant factor contributing to the air quality issue facing urban areas within the state of California. Although the ca rs and trucks produced today generate significantly less pollutants than t hose manufactured in the 1970s, Californians continue to lose billions of do llars a year due to air pollution and its related problems [1]. These problems are not only directly rela ted to human health, but also involve environmental concerns affecting both wildlife and cultivated resources, therefore this issue is of consequence to a wide range of interest groups. The technology involved in creating more fu elefficient vehicles has come a long way, however one of the most effective methods of maintaining the standards of air quality is to simply burn less fuel. In a highly motorized area such as Los Angeles, however, most people prefer the convenience of providing their own transportation to and from their places of employment. In order to offset this tendency, many large companies have attempted to implement trip reduction strategies, which pr ovide incentives for individuals to use public tr ansportation, group transportati on or nonautomotive means of travel such as bicycles. Wherever possible, telecommuting programs are also introduced. The Air Resources Board (ARB) and other agencies have suggested a wide array of such strategies, and in order to maintain the pollution level below acceptable standards, have required large employers to submit pl ans of action which they must undertake in PAGE 11 2 order to influence their Av erage Vehicle Ridership (AVR), which is the ratio of employees to motor vehicles that arrive at a given worksite. An increase in the AVR level, which corresponds to a decrease in the Vehicle Trip Rate (VTR) discussed during the course of the thesis, indicates an e ffective strategy. The large volume of data collected on these various methods to date f acilitates the de tailed analysis of these techniques effectiveness using various the predictive methods at our disposal. 1.1. Motivation for this Thesis Even a cursory glance at a spreadsheet containing the obtained information on trip reduction strategies reveals the problem associ ated with determining the effectiveness of any particular method. A large set of f actor combinations and the difficulty in determining the degree to which each factor influences the outcome of the strategies result in a significant barrier facing those companies wishing to implement effective incentive plans to increase their average vehicle ridership (AVR). The issue facing large employers in light of this situ ation is how to provide the be st set of incentives for their employees in order to affect compliance w ith emissions regulations. Most of the ARB suggestions are not inexpensive, and it is th erefore essential that the data accumulated thus far be examined for useful information. A number of methods have been proposed and implemented for the analysis of this particular set of data, however the results have not been spectacular. Statistical techniques such as regression analysis and backpropagation neural networks have predicted the effectiveness of the various incen tives with levels of accuracy in the order PAGE 12 3 of 20%, although a recent RBF network using the techniques that form the basis for this thesis has managed to increase that number somewhat [2]. Radial Basis networks have shown the most promise so far in thei r ability to find the relationship between the sets of programs and their impact on VTR. The purpose of this thesis is to continue explorati on in that area, utilising a refi ned data set and an alternative statement of the problem than has been previ ously used in an attempt to obtain more accurate prediction results. The thesis also presents a comparison of the results of this procedure with previous attempts at building a model of the data, as well as more recent attempts using other approaches to the anal ysis of this current, reduced data set. 1.2. Contribution to the Discipline A further exploration of the Radial Basis Function network, which has been the most effective means thus far of building a m odel of the data, is undertaken in this body of work. Aside from refinements made to the data set upon which the model is being built, the problem statement and results themselves are stated in a new way. Levels of effectiveness are divided into 8 stages or bins, and one approach is to use the RBF networks as binary classifiers, to determin e which of these levels of impact the data records (which indicate combinations of incentive programs) fit. Another innovation allowed by converting the analysis into a set of binary func tions is the ability of the implemented software to select its own thre shold for the admission of a record into a particular bin. Thus, the maximum possible percentage accuracy is obtained by any given method of training for the initial neural network. PAGE 13 4 The results of the thesis confirm th at the more sophisticated methods of RBF network training, involving the movement of re ference vectors within the neurons and the adjustment of sigma values controlling the slope of the Gaussian (comparison) function, are most effective. They also reveal the e ffectiveness of using Radial Basis networks as classifiers in dealing with large, sophisticated patterns of attributes within a data set. 1.3. Structure of Thesis Report The remainder of the thesis report is or ganized as follows: The basic theory of Artificial Neural Networks is discussed in chapter 2. Also pr esented there are the benefits of the Radial Basis Function, the training methods associ ated with networks using this technique, and the algorithms used dur ing in the course of this thesis. Chapter 3 presents the specific implementation used to study the transportation data, providing initial results and compar isons of the training methods on simple binary. Chapter 4 describes the data set dealing w ith the various incentives being studied, the methods used to reduce and refine this information, and the procedures used in preparing the data for network analysis. Chapter 5 gives the results of training and testing the data on the RBF networks, as well as a statis tical analysis of the various methods. Chapter 6 compares these results w ith alternative examination techniques used by the Center for Urban Transportation Research (CUTR). Chapter 7 gives the conclusions of the thesis and a di scussion of potential future work. PAGE 14 5 Chapter 2 Theoretical Foundations The basic principles of Artificial Neural Networks and their structures are discussed in this chapter, as well as the f actors influencing the d ecision to use Radial Basis Functions for analysing the transportati on data. The structures and mathematical bases of the implemented network are discusse d in detail, providing the formulas used and a brief description of how they were de rived. The training methods provided by the software implementation are in troduced, and the specific algo rithm used is described. 2.1. The Basics of Neural Networks Put in simple terms, an Artificial Neural Network (ANN) is a software application that alters certain variable s in response to a set of corresponding input and output patterns. Beginning with an in itial set of internal values, the network modifies these quantities in order to find a position of best fit, ther eby generating from the input patterns their expected results. For example, a basic network can be trained to act as an AND logic gate, giving an output of when the binary i nput vector (1, 1) is applied. Even the most simple of these networks can quickly learn to identify sets of patterns that exhibit the quality of linear separability. This indicates that the outputs can be classified according to a deci sion boundary defined by the equation: y = ax 1 + bx 2 + cx 3 + nx n + z (1.1) PAGE 15 where (n +1) is the number of dimensions of the input vector. This can be determined and demonstrated graphically, as in Figure 1, by attempting to find a line, plane or hyperplane (depending on the dimensionality of the input pattern) that separates the outputs into distinct classes. Figure 1. Linear Separability in 2D Space and 3D Space Most classification problems, however, are far more complex than this, and the networks are required to be correspondingly more sophisticated in order to correctly identify these patterns. A simple binary function such as XOR, for example, is nonlinearly separable, and therefore presents a problem for singlelayer systems or perceptrons. The solution to this is to add another layer, which acts as an inhibitor, in order to more effectively define a specific solution space. Theoretically, a 3layer neural network is sufficiently advanced to separate any grouping of continuous inputs, provided the network is large enough [3]. Multilayer networks, such as that shown in Figure 2, also called feedforward networks or multilayer perceptrons, have experienced a great deal of success in solving problems where a large amount of data is involved. The ability of the networks to generalize relationships between inputs and outputs is key to their effectiveness. The 6 PAGE 16 limitation of multilayer networks is found in the absence of a method to efficiently train them to adapt to new data. Due to the inherent complexity of the architecture, deciding which values to adjust in order to arrive at more accurate classifications is an NPcomplete problem in a network consisting of only 3 nodes or neurons [4]. Figure 2. A TwoLayer Network Trained For The XOR Function Algorithms such as the Back Propagation method have been effective at providing high levels of accuracy on difficult problems of various types, and although certain inefficiencies in these algorithms have been pointed out recently, refinements are being made with considerable success [5]. For all these related algorithms, the basic method of learning consists of calculating an error term dependent on the difference between the expected and actual outputs, adjusting the weights between the neurons in order to lower that term, and propagating the effects of the error term through previous layers of weights. This process is repeated until the level of difference between obtained and expected outputs is within an acceptable boundary [6]. The applications of the resulting multilayer perceptron systems are found in diverse fields of industry and research, and the numerous refinements and modifications to the basic algorithms that exist [7] are often used to carry out highly specialized operations. 7 PAGE 17 2.2. The Radial Basis Function The basic model of multilayer neural networks utilizes either sigmoid or threshold neurons. These threshold logic units (TLUs) generate a signal, or fire, when the sum of their input signals exceeds a certain level. Sigmoid units provide a more graceful degradation of output signals, and because the derivative of this function can be taken the back propagation method discussed above, which relies on this value, can be implemented for training purposes. Radial Basis Function (RBF) neural networks operate on a somewhat different principle. Instead of having threshold units with a single value against which to compare accumulated sums of input signals, each RBF neuron has a set of values called a reference vector for comparison with an input set of the same cardinality. The driving formula of each neuron used in this thesis work is the multivariate Gaussian function: (2.1) Where x is the input vector for the neuron, t j is the set of reference values, j is the standard deviation ( 2 is the variance) of the function for each of the centers (j), and the value r (x t j ) is the Euclidean distance between a center vector and the set of data points [6][8]. The output of each neuron is reflective of the similarity between input and reference vectors according to the bell curve shown in Figure 3 below. A high degree of similarity produces an output () that approaches 1. Figure 4 provides a graphical representation of the type of classification ability afforded by a twolevel threshold logic system compared with a radial basis function. 8 PAGE 18 Figure 3. Threshold and Sigmoid Functions vs. Gaussian Function It should be understood that the Gaussian function is not the only selection available for RBF networks. Other formulas considered typical are the thinplatespline function: (r) = r 2 log(r) (2.2) the multiquadric function: (r) = (r 2 + 2 ) (2.3) and the inverse multiquadric function: (r) = 1/(r 2 + 2 ) (2.4) The justification for using the standard Gaussian units for this thesis work is found in the property of the networks using this formula to effectively make local adjustments; they adapt to new data without significantly altering the results generated from the input patterns that have already been learned. This local refinement property makes it ideal for use on data with a considerable number of input fields and a large set of records [8]. 9 PAGE 19 Figure 4. TLU Decision Boundaries vs. Gaussian Boundaries The most common architecture for an RBF network, which is the one used in the software associated with this thesis, is the fully connected, single hidden layer arrangement, an example of which is provided by Figure 5. Figure 5. A Radial Basis Function (RBF) Neural Network In such an arrangement, each of the neurons of the hidden layer operate on the Gaussian formula given above, comparing a reference vector of a length equal to the number of inputs. The output layer in this case consists of a single neuron, although radial basis networks are fully capable of supporting numerous such units, and is defined by a linear function of the hidden units values. Networks such as this are able to perform complex pattern recognition tasks by transforming the data into higher10 PAGE 20 11 dimensional space in which, according to Covers separability theorem [9], makes finding linear separability for classification more likely. A simple example of this can be demonstrated by use of the XOR function discussed in Section 2.1. As Figure 2 illustra tes, no single, linear decision boundary can be drawn that will separate the two types of data (designa ted by red and green dots) into distinct classes. If we add a third dimension, however, which is a function of the inputs, a boundary between the tw o classes can be found. The set of input values x for a binary XOR function may be listed as follows: x 1 = (0,0); x 2 = (0,1); x 3 = (1,0); x 4 = (1,1). The expected ou tputs of the function would therefore be 0, 1, 1, and 0 respectively. Duri ng Haykins discussion of Covers theorem, he demonstrates the Gaussian functions abil ity to differentiate be tween the two classes of inputs with only two neurons [6]. A similar (and simplified) network producing identical results may be obtained, for purposes of explanation, by defining a function: ( x i ) =  x i1 x i2  (2.5) resulting in the mapping: x 1 = (0,0) 0 x 2 = (0,1) 1 x 3 = (1,0) 1 x 4 = (1,1) 0 This allows a bisection of the classes of inputs provided, and in this simple example the resulting of each input set corresponds preci sely with the expected outputs of the XOR function. By providing a function of th e existing inputs as another dimension PAGE 21 of the input vector, a plane can be found in the resulting 3dimensional space that allows linear classification to take place (see Figure 6). Figure 6. The XOR Problem in 2 and 3 Dimensional Space In feedforward networks, this is done by adding another layer of neurons, or by preprocessing the data, extracting derived functions before applying the inputs to the neurons [10]. In an RBF network, this principle applies in the following way: each of the hidden neurons contains a complete reference vector corresponding to the number of inputs. Each comparison between the input vector and these reference vectors represents a dimension for use by the output layer in determining classification. When the number of neurons is equivalent to the cardinality of the input vector, each neuron may be used to respond to a single pattern of inputs. While this is ideal where there are limited, finite input patterns, a network that has become familiar with the input variables to this degree is often not optimal for continuous data. If a network perfectly classifies training data, yet is unable to accurately recognize the patterns when provided with new but similar input patterns, the system is described as being overtrained. This overtraining effect, also called overfitting, results from a loss of generality. The neurons become attuned to precise patterns of information, focussing on nongeneric aspects of the training set, and are therefore unable to recognize those sets 12 PAGE 22 13 of data which deviate in th e slightest manner from the se quences on which it was trained [11]. The obvious remedy to this problem is to reduce the number of hidden units, and the selection of the number of neurons in a given network is therefor e an important aspect of its architecture. This aspect require d due consideration when implementing the systems used for obtaining the results presented in this thesis, and is discussed in more detail in Chapter 4. The methods employed in order to train su ch a reduced neuron system are based upon the properties of Greens functions, whic h allow a process of supervised selection of centers to regulate the variables within the reference vectors of those neurons. The Gaussian function, an example of such, takes advantage of the techniques outlined below in order to exhibit the quality known as approximation in which each neuron is able to identify multiple patterns, which are furthe r differentiated by the output layers. An exposition of the approximation techniques ba sed upon Gaussian properties is provided by Haykin [6], and the formulas themselves w ith their derivations are discussed in [12]. 2.3. Training Methods 2.3.1. Output Layer The Delta Rule The networks employed during the course of this thesis consist of two types of neurons, the Gaussian units of the hidden laye r and the linear unit of the output. The training of the output weights can be accomplished by using the Delta Rule, a common technique for adjusting the valu es of singlelayer perceptrons and feedforward networks. The formula employed is: w j ( n+1) = w j ( n) + 1 w (2.6) PAGE 23 where w is the weight connecting a hidden unit with the output layer, j is the hidden unit to which the weight is connected and n is the current epoch, or iteration index of the training set. The learning constant employed by the network for the delta rule is designated by 1 and w is the change in weight generated by the error term. This error term e i is a function: e i = T i Z i (2.7) where T i is the target value and Z i is the obtained value from the network, the linear sum of all outputs from the Gaussian layer: (2.8) M is the number of neurons in the system and (r) is defined by Equation 2.1. The error term e is used to determine the w value using the following equation: (2.9) where N is the total number of input patterns from the data set. 2.3.2. Hidden Layer 1 Movable Centers While using the Delta rule to modify output weights may prove sufficient for many applications, data consisting of high dimensions such as that analysed during the course of this thesis require more sophisticated techniques to obtain a reasonable degree of classification accuracy. A refinement of network training involves modifying the reference vectors contained within the Gaussian neurons in order to more accurately reflect the input patterns to which they are exposed. This provides a set of more 14 PAGE 24 significant (and potentially significantly different) values from the hidden layer to be summed by the output layer for classification and prediction. In a similar manner to the Delta rule, the formula for changing the reference vector t within a neuron is reflected by the formula: t j (n+1) = t j (n) + 2 t (2.10) As before, the n value is the epoch, and the t indicates the change based upon the error term provided by Equation 2.7. The 2 is the learning constant for this phase of the training. The t value is obtained from the function: (2.11) The value is the first derivative of the Gaussian function (2.1): (2.12) 2.3.3. Hidden Layer 3 Adjustable Sigmas The sigma value of the RBF units determines the spread of the bell curve by which the input vectors are compared, the radius of the decision boundary for grouping patterns (see Figures 3 and 4). A large results in a wide spread, and large areas of identified input values. This can generate good training results if there are few outputs and a large range of acceptable values. On the other hand, this will also generate a lot of false positives when the network is actually used. Conversely a small considerably narrows the range of deviation for identifiable input patterns, giving fewer false positives. The drawback is that acceptable values will be missed frequently, particularly if the pattern of expected outputs has a high degree of variability. 15 PAGE 25 Manually adjusting sigma values allows a considerable difference in network performance to be detected, as shown in Section 3.4 (Tables 3 and 4), and a method of allowing the system to automatically adjust its spread in response to the patterns of data on which it is being trained is provided by the following formula: j (n+1) = j (n) + 3 (2.13) The multiplied by the learning constant 3 in order to provide the change in the sigma per epoch is defined as follows, using the same value from Equation 2.12: (2.14) During the actual implementation of the networks used for this thesis, three levels of training are used. The first stage involves training only the output weights, and the other two methods add moveable centers and adjustable sigmas cumulatively. 2.4. Training Algorithm The basic method of training for the system of networks used during this research follows the following steps: 1) The input vector is applied to the hidden layer of Gaussian neurons. 2) The resulting values are multiplied by the output weights and summed at the output layer. 3) The output value is compared with the expected value for that input vector. 4) This error term is used to determine how much the systems parameters are to be adjusted in the following steps. 5) The Delta Rule (Equation 2.6) is applied to the output weights. 16 PAGE 26 17 6) If level 2 or 3 training is selected, Equation 2.10 is used to modify the reference vectors of each of the M neurons of the network. 7) If level 3 training is being used, the sigma value of each RBF unit is altered according the adjustment Equation 2.13. 8) If the error rate is above the acceptable le vel, or if the number of epochs run is less than a predetermined limit, steps 5 to 7 are repeated. Figures 8, 11 and 12 displayed in the following chapter give graphical representations of the above algorithm, and the various stag es are discussed along with their Java implementations on whic h the transportation data was run. PAGE 27 18 Chapter 3 Implementation and Training 3.1. Program Specifications 3.1.1. Program Overview A Gaussian function radial basis neural network was implemented in Java for the purpose of this thesis work. The objectorien ted structure of Java programs lends itself naturally to the modular arch itecture of the system [13] Each hidden neuron of the networks is implemented as an object within the virtual environment, and the creation of new units is thus made a simple process, and results in a freely modi fiable architecture. Each unit produces a value based upon the input vector, using the function NEURON.COMPUTE (see Appendix B) and the sum of th e hidden layer outputs, adjusted based upon the output weights, are summed to provide a network output as described in Equations 2.1 and 2.8. The highly portable nature of the Java e nvironment also enables the network to be executed on a number of different platform s, which could allow for performance optimization analysis in future work. A graphical user interface (GUI) is provided by the system, and allows the user to choose from among the following operations: Train, Test, Run, Save, Help and Exit (see Figure 7). PAGE 28 Figure 7. The Main Menu of The Network Interface The Exit function needs no particular explanation, and the Help feature allows for runtime text screens to be viewed by the user providing information about the file system and functions that are available. The Save option allows the user to write the reference vectors, sigma values and output weights of trained networks to output files for use in data analysis, which is accomplished using the Run function. Selecting Test allows for quick assessment of a newly trained network, processing a keyboard generated input pattern and doing a quick comparison with an expected, userdefined output. The Train procedure, in which most of the systems complexity lies, is explained in the following sections. More details of the other relevant modules are also discussed below. 19 PAGE 29 20 3.1.2. The File System Aside from the six picture icons in th e main menu, provided by graphic interlace files (.gif images), 5 file types are necessary to the basic functiona lity of this program when it is run as an application in the Java environment. These are: 1) RBNet.java the main program 2) NNI.rvf a default reference vector 3) NN1.swf default sigma and weight values 4) NN1.ipf a default set of input patterns 5) NN1.itf a default se t of target values Four file types are used to store information about a neural network: 1) filename.rvf This file stores an n x m matrix of values. The number of rows ( n) is the number of neurons and the number of columns ( m ) is the number of inputs to the network. (rvf = reference vector file) 2) filename.swf This file stores a 2 x n matrix of values: each of the n rows contains two values. The first value is an initial Sigma, and the subsequent value is an initial weight from neuron to output. The number of rows is therefore 1 more than the nvalue from the .rvf file. The number of rows in the file sets the number of neurons in the network architecture. (swf = sigma & weight file) These first two files are produced when a newly trained network is saved. Two files, independent of the netw orks variables are used to co ntain training and testing data: PAGE 30 21 3) filename.ipf This file contains an n x m matrix of values in which the number of rows (n) is the number of input pa tterns and the number of columns (m) is the dimensionality of each vector. (ipf = input pattern file) 4) filename.itf This file contains a 1 x n matrix of values, each row containing a single value: a target corresponding to an input pattern in the .ipf file. The number of targets listed here sets th e number of patterns processed by the network during training. ( itf = input target file) The file filename .rrf is the report generated by the Run function, and is given a detailed examination in Section 3.1.5. Appendix C contains samples of each type of file used by the network for training and analysis. 3.1.3. The Train Function The training option of the system bri ngs into play the theoretical foundations discussed in Chapter 2, and produces networ ks based upon four files provided by the user. When the Train function is selected, the system responds with a prompt requesting a filename. This variable is used to automatica lly identify the file containing the input patterns ( filename .ipf), the list of expected outputs (filename .itf), the initial reference vector for the network to be trained ( filename.rvf), and a set of sigmas for the neurons along with their associated we ights for the output layer ( filename .swf). With these four files, the network is tr ained according to the users selection from among three learning methods. The first impl ements the algorithm described in Section 2.4 according to the diagram show in Figure 8. This is the simplest training method. PAGE 31 Figure 8. Training Level 1 Adjusting OutputLayer Weights After the iterations contained within the Training Module are completed, two onscreen outputs are provided for the user. The training curve (Figure 9) is displayed, indicating the times at which the training began and ended, based upon the computers local clock. The xaxis is the training epoch, and the yaxis is the SSE calculated from the cumulative difference between the expected and obtained outputs (e) as shown in the formula: (3.1) where n is the epoch and N is the number of neurons. The calibration of the yaxis is determined from the initial SSE of the untrained network. The cumulative error of the first epoch is used as the highest yvalue (since the network is anticipated to reduce this quantity), and the axis displays 10% decrements of this total from the initial value down to 0 rounded off at two decimal places. 22 PAGE 32 Figure 9. Training Curve Displaying The SSE Values The second display provided for the user is a diagram of the networks architecture as exemplified by Figure 10. The size of the diagram is dependent upon the number of inputs and neurons of which the network is composed, and while this is more useful as a display feature for smaller networks, it provides information on the Sigma value on the final epoch of training for one of the input patterns which is selected by the user, and the final SSE achieved by the trained system. The final weights of the output layer are also displayed, as well as the expected and obtained value of the users selected input pattern. In addition to this, the number of neurons and inputs is displayed at the bottom of the diagram. 23 PAGE 33 Figure 10. Network Diagram Output At this point, if the Save feature is selected from the main menu (see Figure 7) the user is prompted for a string variable network, and the trained network is saved, its distinctive values encapsulated in the files network.rvf and network.swf. If level 2 training is selected, the learning algorithm follows the flowchart in Figure 11, the difference being found in an added complexity of the Training Module. 24 PAGE 34 Figure 11. Training Level 2 Adjusting Weights and Centers During training, in addition to the Delta Rule being used to modify the output layer weights (Equation 2.6), the reference vector of each neuron is adjusted according to the input patterns via Equation 2.10. Output screens such as those shown in Figures 9 and 10 are provided to the user when the iterative learning is complete, and the user may save the network at this point. As with all three of the training methods, the Save function must be invoked at this point; if the system is exited, or if Train is selected again, the current network will be lost or overwritten respectively. 25 PAGE 35 Figure 12. Training Level 3 Adjusting Weights, Centers and Sigmas Figure 12 shows the algorithm of Section 2.4 being run on a system using level 3 training. In addition to the procedures involved with the previous two levels of training, the sigma values () are adjusted using this technique, altering the variance of the Gaussian function and affecting the width of the decision boundary area (see Equation 2.13 and Figure 4). This directly affects the quality of the classification of input patterns, and serves to further tune the effect of the reference vectors and input weights. 26 PAGE 36 3.1.4. The Test Function Figure 13. Flowchart of The Test Function The Test function, which invokes the algorithm described by Figure 13, allows a saved network to be quickly tested. The name of the network is requested from the user, at which point the corresponding swf and rvf files are loaded into memory. The user is then prompted for a set of inputs, the number of which corresponds to the size of the input vector anticipated by the system. A target value is then entered from the keyboard, and the input values are compared against the reference vector of each neuron and a Zvalue is obtained from the system. No output files are written, but a diagram of the network is provided, showing the from each neuron, the values of the output weights, and the final network output compared to the expected value obtained from the user. This tool is effective particularly in smaller networks to verify the accuracy of the systems classifications. 27 PAGE 37 28 3.1.5. The Run Function The Run feature of the system is that which is used to produce report files used for data analysis. It differs from the Test function in the follow ing ways: it obtains its input patterns and targets from external files, it processes numerous input vectors in an iterative manner, and it provi des a report file in addition to the onscreen output, which contains analysis information on the data. The report file is explained in the section following this one. In Figure 14, filename1 represents the network and filename2 the data set comprised of input patterns and their expected output values. For each input pattern, the values of the vectors are processed by the Ga ussian neurons, and the linear summation of these outputs ( Z ) as affected by the output weights is compared with each expected value. PAGE 38 Figure 14. Diagram of The Run Function Algorithm Each obtained Z value, also called the predicted value is stored in an array along with each corresponding target from the itf file, the expected result, and this information is passed on to the Run Report File (filename2.rrf). 3.1.6. The Classification and Report Module The following text is an actual rrf that resulted from the analysis of XOR files for input and output on a network trained to recognize that function: *************************************************************** NNet Files: XORt.swf & XORt.rvf Data Files: XOR.ipf & XOR.ipf Date: 06/09/03 Time: 10:06:45 PM *************************************************************** For each pattern, T is the Target and Z is the predicted value. Pattern 0: T: 0.0  Z: 0.0035512790504252673 Pattern 1: T: 1.0  Z: 1.0071977444230102 29 Pattern 2: T: 1.0  Z: 0.9979889896543631 PAGE 39 30 Pattern 3: T: 0.0  Z: 0.08665013438300243 Threshold: 0.9979889896543631 Training SSE: 0.0 (if trained this session) Note: If Z < Threshold, it is considered a 0. Rounded Z scores: Pattern 0: T: 0.0  Z: 0.0 Pattern 1: T: 1.0  Z: 1.0 Pattern 2: T: 1.0  Z: 1.0 Pattern 3: T: 0.0  Z: 0.0 Correct: 4, Misclassified: 0, Accuracy: 100.0% The main sections of the report consist of two columns. The first column is the list of target values for a set of input patterns obt ained from an itf file. The second is the list of corresponding Z values resulting from those patterns being classified by a network. The first half of the report pr ovides the actual values obtained from the system, and the second part shows these outputs r ounded to a thre shold value. This number is obtained automatically by the Output Data module of the Run feather, which uses each actual Z value iteratively as a threshold, keeping track of which score provides the best final percentage accur acy. Any number less than the threshold is considered a 0 for classification purposes, and the other Z scores are rounded up to 1. For networks in which a binary classification is not being performed, this part of the report can be ignored. At the bottom of the file is a report su mmary consisting of the number of rightly and wrongly identified patterns and a per centage figure repres enting the networks accuracy in regards to that data set. For anot her example of this and other types of files, please see Appendix C. PAGE 40 31 3.2. Implementation of Mathematical Formulae The output of each neuron ( ) is calculated within the module NEURON.COMPUTE, which is coded in this way: R = 0; for (int q = 0; q < inpnum; q++) { R += ((invector[q] refvector[n][q])*(invector[q] refvector[n][q])); } R = Math.sqrt(R); outvector[n] = Math.exp(((R)*(R))/((Sigma[n]*Sigma[n]))); Referring to Equation 2.1, R is the term  x t j , outvector [ n] is the value associated with each n th neuron, and inpnum is the size of the input vector. Equation 2.12, which provides the term used in both the moving centers formula (Equation 2.10) and the ad justable Sigma technique (E quation 2.13) is calculated in NEURON.COMPUTE by means of the following code in which toutvector [ n] is : for (int m = 0; m < inpnum; m++) { toutvector[n][m] = (2(2*refvector[n][m]))*outvector[n]; ) for each neuron n and each input pattern m This code segment immediately follows the derivation of outvector[ n], and uses that value in its ow n mathematical calculations. For the code of the entire NEURON.COMPUTE module, please see Appendix B. The following module COMPUTE, not to be confused w ith the internal neuron function NEURON.COMPUTE, obtains the Z values (output of the system) by summing the outvectors ( terms) for each neuron as they ar e affected by the output weights: public void Compute(int inptrn) throws IOException { Zval = 0; for (int q = 0; q < inpnum; q++) { invector[q] = x[inptrn][q]; } PAGE 41 32 for (int n = 0; n < neunum; n++) { NeuronCell[n].Compute(n); Zval += (outvector[n]*weight[n]); } } where inptrn is the row of the data set, correspo nding to one record of input values and inpnum is the total number of inputs asso ciated with each input vector. The error term ( e i ), which is key to all levels of learning, is co mputed in the TRAIN function using the Zvalue from the COMPUTE function by simply subtracting it from the target value associated with each row of the data set: Compute(row); e = (target[row]Zval); The Delta rule for training the output we ights, defined in Equation 2.6, is also implemented within the function TRAIN, by means of the following code: Wdelta = e*outvector[n]; weight[n] = weight[n] + MU1*Wdelta; // Change the weights for each neuron n. The formula used to adjust the referen ce vectors of the neurons (Equation 2.10) is coded as follows: if (tsel > 1) // If training method is 2 or 3... { for (int m = 0; m < inpnum; m++) // Change the ref. vectors { Rdelta = 2*weight[n]*e*toutvector[n][m]*(invector[m] refvector[n][m]); refvector[n][m] = refvector[n][m] + MU2*Rdelta; } } The spreadadjustment performed on the neurons by modifying the sigma values, shown in Equation 2.13 is represented in Java by this segment: if (tsel > 2) // If training method is 3... { Sdelta = (1)*weight[n]*e*Mprod; // Change Sigma values } PAGE 42 33 The TRAIN module in its entirety, from which the Mprod value is obtained, can be found in Appendix B. 3.3. Basic Functions A Delta Rule Training (Fixed Centers) In order to test the effectiveness of th e neural network here described, the system was applied to the binary functions OR, AND and XOR. Using arbitrarily assigned reference vectors for 4 neurons, the traini ng curves shown in Figure 15 were obtained after 500 epochs: PAGE 43 Figure 15. SSE Curves For OR, AND and XOR Respectively (Set 1) 34 PAGE 44 35 The results of running the network on a fu ll set of the possible binary inputs yielded the data shown in Table 1. Table 1. Level 1 Training Re sults During Network Testing Function Final SSE Network Accuracy Last non0 SSE epoch OR 0.84 100% 00 AND 0.43 100% 00 XOR 0.05 100% 00 A 00 in the Last non0 SSE epoch column indicates th at the SSE did not drop to 0 within the maximum number of epochs for which the network was trained. 3.4. Basic Functions B Delta Rule vs. Fixed Centers and Delta Although the network performe d perfectly in classifying the input patterns for the three binary test function s, it was observed that the SSE values did not fall to an acceptable level within a reasonable amount of time. For a network with four neurons and such elementary binary functions, the ra te of learning was anticipated to be more dramatic. Since the purpose of the system wa s to analyse data with a much higher degree of complexity, Level 2 training was next used to adapt networks to these functions in an attempt to generate a sharper reduction in th e cumulative errors of the training epochs. See Figure 16 for the learning curves thus obtained. PAGE 45 Figure 16. SSE Curves For OR, AND and XOR Respectively (Set 2) 36 PAGE 46 37 Using the Run function on each of the thre e networks, using the complete set of binary inputs provided the inform ation shown in Table 2 below: Table 2. Level 2 Training Re sults During Network Testing Function Final SSE Network Accuracy Last non0 SSE epoch OR 0.97 100% 00 AND 0.00 100% 233 XOR 0.00 100% 414 Although the OR function was classified less accura tely than the other two binary processes, which may be related to the random nature of the initial centers, the performance of the network was appreciab ly improved when the center adjustment feature was implemented. The curves in the XOR test show the most dramatic improvements, the SSE dropping so sharply after certain e pochs that the display curve is dotted. 3.5. Basic Functions C Both the curves and the tabular results obtained from level 2 training reveal that the adjustment of the reference vectors with in the Gaussian neurons provides higher rates of learning on the binary functions used to test the system. In both these methods, however, the spread of the radial basis f unction remained constant at 0.4, and these variables were modified and the networks re trained on each function to demonstrate the significance of the value to network performance. PAGE 47 38 3.5.1. Manually Adjusting the Sigma The following SSE and accuracy values were obtained after 300 epochs by varying the Sigma value for each of the three functions OR, AND and XOR, and a reduced data set consisting of 50 record s of the transportation data (LA50) in both level 1 and 2 training: Table 3. SSE and Accuracy After 300 Epochs For Level 1 Training Function\ 0.01 0.05 0.10 0.20 0.40 0.70 1.00 2.00 5.00 7.00 10.0 OR 3.0 2.99 2.99 2.32 0.96 0.13 0.23 0.49 0.76 0.77 0.76 AND 1.0 1.0 1.0 0.99 0.51 0.2 0.33 0.64 0.75 0.76 0.77 XOR 2.0 1.99 1.37 0.42 0.08 0.44 0.87 1.04 1.03 1.03 1.04 LA50 11.0 11.0 11.01 11.01 11.0 10.96 10.36 6.69 6.48 7.6 8.75 LA50run 78 78 78 78 78 78 78 86 88 86 80 LAver1 84.85 84.85 84.85 84.85 84.85 84.85 84.85 85.85 84.85 84.85 84.85 Table 4. SSE and Accuracy After 300 Epochs For Level 2 Training Function\ 0.01 0.05 0.10 0.20 0.40 0.70 1.00 2.00 5.00 7.00 10.0 OR 3.0 2.99 2.99 1.36 0.98 0.04 0.25 0.39 0.72 0.75 0.76 AND 1.0 1.0 1.0 0.99 0.00 0.00 0.04 0.33 0.67 0.72 0.75 XOR 2.0 1.98 0.99 0.00 0.01 0.11 0.49 1.03 1.03 1.03 1.04 LA50 11.0 11.0 11.0 11.02 11.0 10.93 9.01 6.22 7.12 7.24 8.69 LA50run 78 78 78 78 78 78 82 92 88 88 88 LAver1 84.85 84.85 84.85 84.85 84.85 84.85 84.85 84.85 84.85 84.84 84.84 The first three rows of Tabl es 3 and 4 consist of the SSE values obtained after the indicated number of epochs for each action liste d in the Function column on the left. The fourth row shows this value for the reduced data set. Row five is the percentage of accuracy achieved by training, and row six is the percentage of accuracy the trained network obtained when run on a validation se t of 99 random records from the original spreadsheet. PAGE 48 39 As these tables indicate, there is a defin ite point for each type of training at which the sum of squared errors is lowest, and the tr aining accuracy is at a peak. For the subset of the actual data, the LA50 training set, using level 1 training a nd a consistent sigma value of 5.0 in each Gaussian neuron provides the best results. For level 2 training, that which adjusts both the output weights a nd the reference vectors, the lowest SSE value and the highest degree of accuracy are both observe d when the neurons use a Sigma of 2.0 for calculations. Further testing using even higher values than 10.0 provided a consistent degradation from these peak values. The different values of Sigma for whic h each function performs best should be noted. There is not one particular value that provides optimu m learning, but the final SSE depends also on the initial reference vector and weights, the tr aining data provided, and the nature of the function itself. 3.5.2. Automatically Adjusting the Sigma The data from Section 3.5.1 justifies the use of further refinements to the training algorithms utilized by the ne twork, since manually adjusting Sigmas for each new network would be a time consuming task. Th is would be rendered even more impractical if the variable of each neuron was adjusted individually for optimum performance. Because of this, a third level of training was implemented within the system. This automatic adjustment of the sigma value for each neuron during the course of the training generated the results shown in Figure 17 after 500 epochs: PAGE 49 Figure 17. SSE Curves For OR, AND and XOR Respectively (Set 3) 40 PAGE 50 41 Table 5. Level 3 Training Re sults During Network Testing Function\Results Final SSE Network Accuracy Last non0 SSE epoch OR 0.39 100% 00 AND 0.00 100% 181 XOR 0.00 100% 250 3.6. Analysis of Preliminary Training (OR, AND, XOR) Table 6. Summary of Training Results on Binary Functions Function Lv1 FSSE Lv1 LNZE Lv2 FSSE Lv2 LNZE Lv3 FSSE Lv3 LNZE OR 0.84 00 0.97 00 0.39 00 AND 0.43 00 0.00 233 0.00 181 XOR 0.05 00 0.00 414 0.00 250 The Lv value is the level of training as described in Sections 3.3 to 3.5, the FSSE term is the final sum of squared errors value, and LNZE represents the last epoch of training in which the SSE was not 0 (Last NonZero Epoch). This information shows a significant impr ovement with each new level of training used by the system, and the superior performa nce of level 3 training in every test case run. In 500 epochs of Level 1 training, the average SSE of the three binary functions listed is 44, and the network is unable in this time to recognize any of them perfectly. By adding the ability to shift re ference vectors within neurons to the training method, the average SSE becomes 32.33, with both AND and XOR patterns learned perfectly. Level 3 training cuts the average SSE down to a mere 13, and while the OR function does not converge in this period, its error is minimized using automatic Sigma adjust and the patterns which do converge do so in an average of 108 less training cycles. PAGE 51 42 Chapter 4 Application Initialization 4.1. Description of Data Appendix A gives a full listing of the vari ables comprising the reduced data set used for this thesis work, and their descri ptions. The dependent variable, which the various regression and network technique s are attempting to predict, is the DELTA_VTR, or the change in the Vehicle Trip Rate over the course of time in which the incentive plans and their associated costs, represented by the independent variables (i.e. predictors), were implemented. In order to aid in the analysis, different degrees of impact were sorted into 8 classes or bins, as described by Table 7 below. A drop in VTR by more than 7 indicates that the combination of incentives was very effec tive, and this record is sorted into Bin 1. A record of incentive programs that produced a reduction between 4 and 7 is classified as belonging to Bin 2. Very minor changes identify an incentive combination as Bin 4 or 5, and if the VTR begins to increase, the record is sorted into Bins 57, depending on the degree of impact. Table 7. Classification of Data and The Bin Ranges Range < 7 7 to 4 4 to 2 2 to 0 0 to 1 1 to 2.5 2.5 to 5 > 5 Bin # 1 2 3 4 5 6 7 8 PAGE 52 43 The data on which the RBF network analysis is bei ng performed consists of 16,302 such records, already divided into bins for training. The number of records in Bin 1 is 2543. Bin 2 holds 2310, Bin 3 has 2192 and th ere are 2635 records that are classified as Bin 4. 1414 records belong to Bin 5, 1479 patterns result in changes within the range of Bin 6, 1714 records are sorted into Bin 7, and Bin 8 holds 2015 data patterns. This relatively even split of patte rns among the bins renders padding the data by duplicating random records unnecessary, although this technique was used to balance the training methods used on previous, unrefined sets of the same data. 4.2. CUTR Analysis Results The accuracies of previous attempts to pr edict the bin into which data patterns of this set are presented in the following tables. Four types of classification were attempted using regression techniques by researchers at the Center for Urban Transportation Research (CUTR). Models were built using the software SPSS 11.1 for Windows, and they were designed according to the following techniques: 4.2.1. Forward Regression The system begins by computing which of the predictor variables has the largest bivariate correlation with (i.e. impact on) the DELTA_VTR value, and then other variables are selected based upon their relative contri bution to the variance in the dependent variable. Predictors th at do not have a significa nt impact are ignored. 4.2.2. Backward Regression This method begins by using all the poten tial predictor variables, and then deleting those that do not correlate to the va riance within a signif icance level of 90%. PAGE 53 44 The deletion of variables stops when all the f actors calculated to be relatively useless are gone. 4.2.3. Stepwise Regression Stepwise Regression combines both th e forward and backward concepts, providing a more complex selection process. This method has proven to be the most effective at actual classifica tion, and provided the figures ag ainst which the Radial Basis Network was most effectively compared. This procedure both adds and subtracts predictor variables based upon intercorrelations between those already selected. A simple example of this is that if the inclus ion of one variable weakens the impact of an alreadyselected variable, the new one is not included in the final set produced. 4.2.4. Forceentered Regression This simplest of techniques merely accep ts all the variables specified, regardless of their correlation with the dependent variable. Based on the variables selected by all th ese techniques, and the full set accepted by forceentered, two sets of neural networks were developed one with no hidden units and the other with a number of such units selected by the SPSS software as optimal and the results of both the regre ssion and network analyses ar e included in this report. In the tables below, Exact T. is the accuracy of the networ k in classifying the records of its training set precise ly. That is, if a record from the training set is expected to produce a Bin value of 8, the network correctly classifies it if and only if it produces a value of 8 for that record. 1Off T. is the ability of a technique to classify the pattern with an error range of 1. PAGE 54 45 Exact V. is the ability of a network to precisely predict the Bin into which a data record of an unseen Validation set falls. 1Off V. considers a classification correct if, for example, the expected Bin numbe r is 4, and the returned value ( Z ) is 4 Z 6. The values given describe both the accuracy of various regression techniques and neural network analyses based on the variables obtai ned from the respective regression methods. Set A: Forced Regression. Table 8. ForceEntered Regression Results Classtype Total Bin1 Bin2 Bin3 Bin4 Bin5 Bin6 Bin7 Bin8 Exact V. 16.75 0.86 18.88 33.71 30.09 12.50 14.77 9.16 6.35 1Off V. 16.63 1.75 16.81 33.28 30.37 11.31 14.03 10.41 6.50 Exact T. 53.92 0.00 0.00 0.00 65.75 44.56 40.00 0.00 0.00 1Off T. 47.36 27.15 50.56 76.10 71.73 54.61 34.17 27.98 23.88 Table 9. ForceEntered Network Results (0 Hidden Units) Classtype Total Bin1 Bin2 Bin3 Bin4 Bin5 Bin6 Bin7 Bin8 Exact V. 15.92 0.29 16.08 31.74 33.81 12.66 15.53 8.40 2.21 1Off V. 15.81 0.91 14.21 32.69 33.00 12.16 15.00 9.46 1.69 Exact T. 46.51 23.05 50.35 74.72 76.36 62.45 32.95 26.34 17.40 1Off T. 46.68 24.10 47.12 75.84 74.97 61.53 35.88 27.34 16.68 Set B: Backward Regression Table 10. Backward Regression Results Classtype Total Bin1 Bin2 Bin3 Bin4 Bin5 Bin6 Bin7 Bin8 Exact V. 15.96 8.93 33.92 28.09 20.35 9.62 10.98 7.63 4.42 1Off V. 16.81 1.49 17.01 34.57 30.28 11.47 13.99 10.52 6.61 Exact T. 45.80 42.65 68.53 75.28 62.39 39.90 24.24 20.99 18.23 1Off T. 47.22 26.72 50.39 76.38 71.75 53.98 33.69 28.05 23.81 PAGE 55 46 Table 11. Backward Regression Ne twork Results (0 Hidden Units) Classtype Total Bin1 Bin2 Bin3 Bin4 Bin5 Bin6 Bin7 Bin8 Exact V. 16.16 0.29 15.03 33.43 34.52 10.97 15.53 8.78 3.04 1Off V. 16.08 0.65 14.42 35.13 32.72 11.93 13.36 10.07 2.65 Exact T. 46.75 23.34 51.05 75.28 76.12 59.49 31.82 25.57 21.27 1Off T. 46.95 23.58 48.50 76.35 75.03 58.42 35.27 26.99 20.39 Table 12. Backward Network Results (With Hidden Units) Classtype Total Bin1 Bin2 Bin3 Bin4 Bin5 Bin6 Bin7 Bin8 Exact V. 17.26 5.48 19.93 25.28 36.17 18.14 12.88 8.78 5.26 1Off V. 17.66 5.99 17.28 26.03 37.09 16.38 14.56 9.76 5.78 Exact T. 48.09 28.82 47.90 75.00 72.58 66.24 36.36 27.86 22.93 1Off T. 48.15 31.00 45.64 72.41 73.09 65.71 38.14 27.95 23.11 Set C: Forward Regression Table 13. Forward Regression Results Classtype Total Bin1 Bin2 Bin3 Bin4 Bin5 Bin6 Bin7 Bin8 Exact V. 16.40 0.58 16.08 34.55 30.53 10.58 14.02 9.54 6.35 1Off V. 16.73 1.10 16.84 34.60 30.80 10.58 13.77 10.29 6.78 Exact T. 46.43 25.65 53.50 74.72 71.24 54.33 31.06 27.10 22.65 1Off T. 47.42 26.34 50.76 76.65 72.46 54.35 34.13 27.55 23.98 Table 14. Forward Regression Netw ork Results (0 Hidden Units) Classtype Total Bin1 Bin2 Bin3 Bin4 Bin5 Bin6 Bin7 Bin8 Exact V. 16.12 8.07 18.18 22.75 33.10 10.55 15.15 7.63 6.35 1Off V. 16.12 0.52 14.15 34.66 33.50 11.93 14.34 10.11 2.03 Exact T. 46.12 35.73 50.70 62.36 72.81 60.34 30.30 23.28 24.03 1Off T. 46.85 23.10 48.20 76.21 74.97 58.79 35.40 27.22 20.12 PAGE 56 47 Table 15. Forward Regression Netw ork Results (With Hidden Units) Classtype Total Bin1 Bin2 Bin3 Bin4 Bin5 Bin6 Bin7 Bin8 Exact V. 17.07 11.82 16.08 17.70 39.95 18.14 12.12 5.73 6.63 1Off V. 18.30 7.42 17.62 25.38 39.84 18.61 14.12 8.58 5.57 Exact T. 47.22 38.33 50.35 61.24 74.47 67.51 31.82 22.90 23.20 1Off T. 48.56 31.65 45.64 73.70 75.23 68.03 37.96 25.00 22.46 Set D: Stepwise Regression Table 16. Stepwise Regression Results Classtype Total Bin1 Bin2 Bin3 Bin4 Bin5 Bin6 Bin7 Bin8 Exact V. 16.20 0.58 15.73 33.99 30.31 10.58 13.64 9.54 6.35 1Off V. 16.81 1.49 17.01 34.57 30.28 11.47 13.99 10.52 6.61 Exact T. 46.24 25.65 53.50 74.16 70.58 54.33 30.68 27.86 22.38 1Off T. 47.22 26.72 50.39 76.38 71.75 53.98 33.29 28.05 23.81 Table 17. Stepwise Regression Ne twork Results (0 Hidden Units) Classtype Total Bin1 Bin2 Bin3 Bin4 Bin5 Bin6 Bin7 Bin8 Exact V. 17.23 17.87 22.73 20.22 25.06 8.86 18.56 11.07 9.12 1Off V. 16.21 0.42 13.74 34.49 34.45 11.79 14.42 10.30 1.93 Exact T. 45.09 48.13 53.15 58.15 55.56 53.16 34.09 25.57 27.62 1Off T. 46.65 22.38 47.79 76.55 75.03 58.61 35.40 26.72 19.88 Table 18. Stepwise Regression Netw ork Results (With Hidden Units) Classtype Total Bin1 Bin2 Bin3 Bin4 Bin5 Bin6 Bin7 Bin8 Exact V. 17.07 10.37 19.58 19.94 40.43 12.66 12.88 6.11 5.25 1Off V. 17.59 5.54 18.22 25.45 39.77 15.03 15.58 9.49 2.10 Exact T. 47.22 38.90 51.40 58.43 70.45 64.98 32.95 25.19 28.45 1Off T. 47.89 29.87 44.43 75.03 72.57 68.12 37.61 27.41 20.67 PAGE 57 48 Table 19. Stepwise Regression Network Results With Costs (8G1) Classtype Total Bin1 Bin2 Bin3 Bin4 Bin5 Bin6 Bin7 Bin8 Exact V. 16.00 3.17 16.08 23.88 43.50 14.77 8.71 6.11 1.66 1Off V. 17.52 4.83 16.84 25.59 41.83 18.19 13.36 7.31 2.10 Exact T. 46.31 25.65 44.41 73.88 75.65 71.73 34.09 23.66 14.92 1Off T. 47.87 27.79 43.89 73.56 77.76 72.30 36.99 24.20 17.85 Table 20. Stepwise Results: Hidden Units/Reduced Dataset (8IOB1) Classtype Total Bin1 Bin2 Bin3 Bin4 Bin5 Bin6 Bin7 Bin8 Exact V. 16.92 7.19 19.55 27.59 35.52 13.29 12.10 6.99 6.56 1Off V. 18.30 7.75 22.68 30.34 33.02 14.29 12.64 9.33 8.83 Exact T. 47.82 31.85 54.09 77.78 67.24 56.65 36.94 27.42 23.94 1Off T. 49.53 37.48 54.37 76.05 68.92 58.77 35.36 27.19 27.89 4.3. Selection of Network Architecture 4.3.1. Number and Selection of Inputs Appendix A describes the list of variables considered to be re levant by a Stepwise Regression analysis technique, which corresponds to Table 20 of Set D. As the Total column of the tables indicates, this technique provides the most accurate prediction results, particularly when the variables it sele cts are used to train a neural network that contains hidden units. The Radial Basis networ k implemented for this thesis uses this list of variables for training in an attempt to utilize this already es tablished information. 4.3.2. Number of Neurons and Epochs As mentioned in Section 2.2, overfitting of a network is a very real concern when doing analysis of data. Networks may show very good results in classifying their training sets, however when applied to a validation set results may be obtained which are less desirable than was expected. Using too larg e a network or too l ong a training period can PAGE 58 49 lead to redundancy, and such a network will amp lify the effects of noise data within the input patterns resulting in this poor performance [14]. In order to avoid this, a network containing as few neurons as possible, while maintaining good training results, is used to obtain the highest degree of generalization, and thus the best validation results. Testing to discover the optimal netw ork size was performed by varying the number of network neurons in an attempt to classify the records of Bin 3, the most accurately classified class of patterns in the CUTR experiments. Table 21 reveals that significant changes relating to training accu racy are obtained in reasonable time with about 2000 neurons, and this number was us ed for obtaining the main training and validation results of this thesis. It should be noted that a rare case with certain va lues causes a NaN (not a number) error during calculations. This is most likely due to a rounding problem when dealing with very small doubles, and thus divisions by 0 are an occasional concern. Testing has shown that the easiest way to allevi ate the difficulty is to slightly increase or decrease the number of neurons in the system. This does not affect the training or validation accuracies to any detectable de gree, and allows the network to avoid the potential rounding difficulty. PAGE 59 50 Table 21. Varying The Number of Neurons For Classifying Bin 3 Neurons Time (seconds) Initial SSE Final SSE Delta SSE 100 17 2055 1923 132 200 123 2000 1781 219 300 229 2031 1690 341 400 611 1999 1607 392 500 563 1980 1517 463 600 613 1949 1427 522 700 1024 1909 1329 580 800 1440 1895 1250 645 900 783 1845 1185 660 1000 1201 1809 1097 712 1100 957 1780 1026 754 1200 997 1751 951 800 1300 1056 1721 880 841 1400 1094 1690 805 885 1500 118 1669 735 934 1600 124 1636 663 973 1700 1567 1604 597 1007 1800 1646 1577 527 1050 1900 1565 1552 454 1098 2000 3034 1520 391 1129 PAGE 60 Figure 18. Diagram of Training Results with Variable Neurons As Table 23 indicates, the SSE values are not always directly proportional to the classification accuracy of the system, therefore when testing the network for a reasonable number of epochs without resulting in over training, the SSE values were not considered, only the degree of accuracy resulting from the report module. These results are shown in Table 22. Based on the data in Tables 21 and 22, a system of 2000 neurons, trained for 1000 epochs, was selected for use by the Radial Basis Network. 51 PAGE 61 52 Table 22. Varying The Number of Epochs For Classifying Bin 3 No. of Epochs Training Accuracy Validation Accuracy 500 99.049 85.75 600 99.055 85.74 700 99.055 85.75 800 99.067 85.75 900 99.049 85.75 1000 99.086 85.75 1100 99.092 85.75 1200 99.092 85.75 PAGE 62 53 Chapter 5 Execution and Analysis 5.1. Training and Testing Results on Application Data Two distinct types of analysis are a ttempted by the Radial Basis Network. The first is an 8level binary cl assification. In essence, the problem of classifying data patterns into eight bins is split into eight sepa rate problems. Eight different networks are used, each of which is designed to determine whether or not a particular data pattern belongs to a specific bin. For example, the Bin1 network produces a binary result: if a data pattern belongs to Bin 1, the Bin1 netw ork will output a 1, or it will produce a 0 if the pattern is of another Bin. The same holds for Bins 2 8. Table 23. Network Training Results Bin Trained Initial SSE Final SSE Delta SSE Initial Accuracy Final Accuracy Delta Accuracy Bin1 1790 368.99 1421.01 87.72 97.49 9.77 Bin2 1570 86.09 1483.91 87.52 99.48 11.96 Bin3 1540 147.41 1392.59 88.55 99.09 10.54 Bin4 2106 564.45 1541.55 84.27 96.09 11.82 Bin5 948 2.79 945.21 93.74 99.98 6.24 Bin6 919 2.44 916.56 92.22 99.98 7.76 Bin7 1083 3.13 1079.87 91.84 99.98 8.14 Bin8 1286 17.61 1268.39 90.31 99.88 9.57 Average 1405.25 149.114 1256.136 89.521 98.996 9.475 Macro1 2080 634.46 1445.54 15.32 23.01 7.69 Macro2 2080 629.95 1450.05 15.32 23.39 8.07 PAGE 63 The second type of analysis, corresponding to the rows labelled Macro1 and Macro 2 in Table 23, is an attempt to directly duplicate the Regression and Backpropagation Network results by training one network to recognize all eight types of Bins. The results of these two approaches are discussed in section 5.2. Figure 19. Effects of Training on SSE Values Figure 20. Effects of Training on Accuracy Table 24. Network Classification Results Type Bin1 Bin2 Bin3 Bin4 Bin5 Bin6 Bin7 Bin8 Avg. OVR Macro1 Macro2 ExactV. 84.22 85.90 85.75 83.01 91.67 91.46 89.83 85.91 87.255 0.435 17.62 16.75 1OffV. N/A N/A N/A N/A N/A N/A N/A N/A N/A 0.653 46.38 45.89 ExactT. 97.49 99.48 99.09 96.09 99.98 99.98 99.98 99.88 98.996 92.056 23.01 23.39 1OffT. N/A N/A N/A N/A N/A N/A N/A N/A N/A 92.203 57.04 57.87 The numbers provided in Table 24, and shown in Figures 19 and 20, indicate the percentage of accuracy with which the various bins are able to classify the data records. 54 PAGE 64 55 The Avg. field contains the simple average over the eight bins for each of the binary classification types, and OVR stands for the overall accuracy if multiple classification and null classifica tion errors are taken into acc ount. The results are labelled in the comparison tables below as RBF Net A. The Macro1 and Macro2 fields contain the accuracies derived from training the network to recognize all 8 bins at once. Macro1 was trained with 730 epochs and Macro2 with 1030 to show the bene fits of training and to provide a measure of potential overfitting. The results for Macro2, which are used for comparison, translate into the following tables under the heading RBF Net B. Table 25. Error Analysis EType OVR Macro Null V. 98.749 N/A Mult V. 0.000 N/A MultZ V. 0.000 N/A Null T. 7.545 N/A Mult T. 0.307 N/A MultZ T. 0.184 N/A Missed N/A 42.13 In the error analysis displaye d in Table 25, a Null error indi cates that no bin was assigned to the record in either V (va lidation set) or T (training set) testing. A Mult error indicates multiple classification. A MultZ error indicates multiple cla ssifications outside of the 1off (acceptable error) range. These apply to the binary approach. The Missed error type is a simple misclassification outside of the 1off (acceptable error) range, and this type of error applies to th e Macro2 network. PAGE 65 56 5.2. Analysis of Performance The data obtained from the two approach es to classification using the Radial Basis network yields some notable results. At first glance it appears as if the binary approach, splitting the problem into eight yesorno type segments, produces a high level of accuracy. Error analysis, however, indicates that this appearance is given by the fact that for each of the 8 bins, only an eighth of the data patterns on average will produce 1s, while all the others produce 0s. During trai ning, therefore, the initial accuracy if all outputs were set to 0 is alre ady 7 out of 8, or 87.5%. By modifying relatively few outputs, the Radial Basis network reported an accuracy of nearly 100% for the training set. The problem with this approach was revealed by subtracting out the Null errors (reporting expected 1s as 0s for all bins), a nd the resulting accuracy, when applied to the validation sets, falls off to practically 0%. The Macro approaches, on the other hand, report much lower training accuracy. However, the prediction results of traini ng all 8 bins at once does not diminish significantly when errors are taken into account. The ability of the network to actually classify data is much higher, and much more consistent. While the overall result of the binary approach on the validation set is an extremely low 0.65%, the Macro network reports 45.89%. The significance of these numbers when compared to the previous approaches to classification is discussed in the following section. In comparing the two Macro networks, th at which was trained for 730 epochs vs. that which was trained for 1030, we find that th ere is actually a slight decrease in the reported accuracy with more training time. While the difference between the two was less than a percent in each case, and t hus Macro2 was a valid choice for comparison PAGE 66 57 against the prior approaches to the classifi cation problem, it does indi cate that the actual optimum performance of the RBF Network may not have been achieved. PAGE 67 Chapter 6 Performance Comparisons 6.1. Comparison with Regression Results Table 26. Comparison of RBF Network and Regression Results Classtype RBF NetA RBF NetB Forced Backward Forward Stepwise Exact V. 0.435 16.75 16.75 15.96 16.40 16.20 1Off V. 0.653 45.89 16.63 16.81 16.73 16.81 Exact T. 92.056 23.39 53.92 45.80 46.43 46.24 1Off T. 92.203 57.87 47.36 47.22 47.42 47.22 Figure 21. Radial Basis Networks and Regression Analysis Table 26 and Figure 21 above show comparisons between the system implemented for the thesis work (RBF Net) and the methods used by CUTR for predicting the VTR change based upon the list of incentives. Forced is the heading of the column containing the forceentered regression results, using all the attributes regardless of their apparent effectiveness as predictors. Backward shows the results of backward regression; Forward and Stepwise in a similar manner show the results in the total column of the tables for forward and stepwise regression. The Tables for this data are 8, 10, 13 and 16. 58 PAGE 68 6.2. Comparison with Backpropagation Networks Table 27. Comparison of RBF Network and Regression Networks 1 Classtype RBF Net A RBF Net B Forced Backward Forward Stepwise Exact V. 0.435 16.75 15.92 16.16 16.12 17.23 1Off V. 0.653 45.89 15.81 16.08 16.12 16.21 Exact T. 92.056 23.39 46.51 46.75 46.12 45.09 1Off T. 92.203 57.87 46.68 46.95 46.85 46.65 Figure 22. Radial Basis Networks and Backpropagation Analysis Table 28. Comparison of RBF Network and Regression Networks 2 Classtype RBFNet A RBFNet B Backward Forward Stepwise1 Stepwise2 Stepwise3 Exact V. 0.435 16.75 17.26 17.07 17.07 16.00 16.92 1Off V. 0.653 45.89 17.66 18.30 17.59 17.52 18.30 Exact T. 92.056 23.39 48.09 47.22 47.22 46.31 47.82 1Off T. 92.203 57.87 48.15 48.56 47.89 47.87 49.53 Figure 23. Radial Basis Networks / Hidden Backpropagation Analysis 59 PAGE 69 60 A comparison of total (overall) accuracies is considered once more in Tables 27 and 28. Table 27 shows the difference between the results of the thesis system and the backpropagation networks built using the attributes of the type of regression with which each column is labelled. This table uses the networks for which the results are fully described in Tables 9, 11, 14 and 17, those w ith no hidden units in their architecture. Table 28 shows the comparison results between the Radial Basis network and networks built with regressiondetermined attributes a nd a number of hidden units, as described in Section 4.2. Figures 22 and 23 correspond to the data in Tables 27 and 28, providing a graphical representation of the numbers. The forceentered regression results were not considered significant enough to merit a hiddenunit network, however the st epwise regression technique produced three backpropagation architectures, labelled in Table 28 as Stepwise1, Stepwise2 and Stepwise3. These consist of the Total colu mn from Tables 18, 19 and 20 respectively. Stepwise1 is a network built upon only those attr ibutes selected as being predictive by the stepwise regression. Stepwise2 builds a similar network, includi ng all the costs associated with each incentive program; some of the costs were not considered to be significant by the technique. Stepwise3 builds a network on the reduced data set used in this thesis. The attributes selected from this refined set of information is found in Appendix A, and makes this comparison one of the most significan t, as it is a direct comparison of network performance on identical data patterns. PAGE 70 61 Chapter 7 Conclusions 7.1. Final Analysis As the results of Chapter 6 indicate, the two approaches employed by utilizing Radial Basis Function networks va ry greatly in effectiveness. The binary approach to the problem reported extremely high accuracy when applied to the training set, however it was unable to perform with any degree of va lidity when it was exposed to data patterns not previously encountered. For the validation tests, the binary classification network performed the most poorly, and this work has demonstrated that splitting the categorization problem into binary component s does not constitute an effective form of analysis using current techniques. The Radial Basis Function, when applied to all bins at once, produces results that are far more useful in practice. The netw ork implemented during this thesis reported predictions on the data that were comparable to the best of the Regression and Back Propagation techniques for Exact classification on both the tr aining and validation sets. The usefulness of the RBF Network is more clearly revealed by examining the 1Off classification accuracy for both testing types. In both the training and validation tests, the Radial Basis network outperforms all the other approaches by a wide margin (250% better than the nextbest appr oach), and the 1Off validati on accuracy more than doubles that of the Stepwise Regression Back Propa gation Network with hidden units, against which it can be most closely compared. PAGE 71 62 The divergence of values for the RBF Network between the Exact and 1Off accuracies may be a natural product of the a pproximation characteristic of the Greens function used by the network. (See Chapter 2.2 and [6]) While further testing is needed to show the exact limits of the Radial Basis functions ability to classify data of this t ype, particularly as overt raining may have been a minor factor, the results are significant enough to conclude that the techniques presented in this work constitute the most effective method yet examined for analyzing the type of data sets provided. 7.2. Future Work The low accuracy reported by all techniques us ed to classify the validation set of the traffic information indicates that further re finements to the data set may be in order. The training time required by the RBF Network was significant, th e application often taking several days to produce a result. Tes ting on different platforms may be effective in deciding how to re duce the training time. Further data sets from Washington, DC and Tucson, AZ will be analyzed using the same techniques. This will further test th e Radial Basis Networks ability to deal with this type of information. In terms of the implementation itself, two improvements may be made. First, an internal rounding method may be introduced in order to eliminate the rare divisionbyzero error reported in Chapte r 4.3.2. Second, automatic sele ction of the number of hidden units as described in [14] may be im plemented to increase th e overall accuracy of PAGE 72 63 the system. Testing will reveal whether or not the tradeoff between accuracy and training time if this addition is made will be worthwhile. PAGE 73 64 References [1] Consumer Information California Air Resources Board Official Website, http://www.arb.ca.gov/ May 7, 2003 [2] R. Yelkur, Radial Basis Function Ne twork for Predicting The Impact of Trip Reduction Strategies, Thesis report, April 1999 [3] R. Perez, Artificial Neur al Networks, University of South Florida Lecture Notes, Spring 2002 [4] A. Blum, R.L. Rivest, Training a 3node neural network is npcomplete, Advances in Neural Information Processing System s I, pp. 494501, San Mateo, California, 1989 [5] P. van der Smagt, G. Hirzinger, Why f eedforward networks are in a bad shape, Proceedings of the 8 th International Conference on Artif icial Neural Networks (ICANN), Skvde, Sweden, 24 September 1998, pp. 159 164, SpringerVerlag Birlin Heidelberg New York, 1998 [6] S. Haykin, Neural Networks, A Comprehensive Foundation, 2 nd ed., PrenticeHall, 1999 [7] D. Kati S. Stanlovi Fast Learning Algorithms for Training of Feedforward Multilayer Perceptrons Based on Extended Kalman Filter, IEEE International Conference on Neural Networks Vol. 1 pp. 196 201, 1996 [8] N. Sundararajan, P. Saratchandran, L. YingWei, Radial Basis Function Neural Networks with Sequential Learning World Scientific Publ ishing Co. Pte. Ltd., 1999 [9] T. Cover, Geometrical and Statistical Pr operties of Systems of Linear Inequalities with Applications in Pattern Recognition, IEEE Transactions on Electronic Computers, EC14(3):326334 June 1965 [10] L. Mendelsohn, Preprocessi ng Data For Neural Networks, Technical Analysis of Stocks & Commodities (magazine), Technical Analysis, Inc., 1993 [11] L. Kai Hansen, J. Larsen, Uns upervised Learning and Generalization, IEEE International Conference on Neural Networks Vol. 1 pp. 25 30, 1996 [12] T. Poggio, F. Girosi, A Theory of Networks for Approximation and Learning, Massachusetts Institute of Technology Artificia l Intelligence Laboratory and Center for Biological Information Processing, Wh itaker College, Cambridge, MA, 1989 PAGE 74 65 [13] H. M. Deitel, P.J. Deitel, Java tm How To Program, 4 th edition, PrenticeHall, Inc., 1997 [14] P. Kanjilal, Orthoganol Transformation Techniques in the Optimization of Feedforward Neural Network Systems, Optimization Techniques, pp. 53 79, Academic Press, 1998 PAGE 75 66 Appendices PAGE 76 67 Appendix A: Table of Incentive Variables and Dependent Variable Table 29. Table of Incentive Variables and Dependent Variable Input Code Variable Description BFO Facility improvements BFR Bike racks and lockers BGC Company vehicle guaranteed return trip BGO Other guaranteed return trip program BGR Rental car guaranteed return trip BGT Taxi guaranteed return trip BGT_DOLLAR_AMT Cost associ ated with implementing BGT BGU Unscheduled overtime guaranteed return BIKE Ratio of bike users to employees BMM Posted materials (Marketing) BMN New hire orient ation (Marketing) BMO Other marketing elements BMR Company recognition (Marketing) BMS Special interest cl ubs (Biking, Walking) BUS Ratio of employees who use a bus system CAR2 Two people per vehicle CAR4 Four people per vehicle COMPRESSED Binary value: Are compressed weeks used? CWW336 Compressed work week: 3 days, 36 hours CWW440 Compressed work week: 4 days, 40 hours CWW980 Compressed work week: 9 days, 80 hours DFT_DOLLAR_AMT Cost associat ed with implementing DFT* DNT Additional time off with pay DPC Increased parking costs for drivealones DPO Other parkmanagement strategies DW4 Compressed work week: 4 days, 40 hours DW4_DOLLAR_AMT Cost associat ed with implementing DW4 DWO Other compressed work week schedules ISS Cafeterias, ATMs, Postal, Fitness Centers IST_DOLLAR_AMT Cost associated with implementing IST** RS_MATCH A program to match drivers and riders TARGET_AVR Average Vehicle Ridership goal TELECOMMUTE Binary value: Is telecommuting available? TRANSIT Binary value: Are shuttle systems available? DELTA_VTR Vehicle Trip Rate; Dependent variable *DFT, not used as a predictor, st ands for Ongoing transit subsidies **IST, not used as a predictor, stands for Transit information or pass sales PAGE 77 68 Appendix B: Code Modules public void Train() throws IOException { String outstring = "Times:\n"; int n, Epoch; for (n = 0; n < MAXE; n++) { epochSSE[n] = 0; } maxSSE = 0; date = new Date(); outstring += "Start: + sdf2.format(date); for (Epoch = 0; Epoch < MAXE; Epoch++) { for (int row = 0; row < patterns; row++) { Compute(row); e = (target[row]Zval); for (n = 0; n < neunum; n++) //For each neuron { Wdelta = e*outvector[n]; weight[n] = weight[n] + MU1*Wdelta; // Change the weights if (tsel > 1) // If training method is 2 or 3... { for (int m = 0; m < inpnum; m++) // Change the ref. vectors { Rdelta = 2*weight[n]*e*toutvector[n][m]*(invector[m] refvector[n][m]); refvector[n][m] = refvector[n][m] + MU2*Rdelta; } } if (tsel > 2) // If training method is 3... { Mprod = 0; for (int m = 0; m < (inpnum 0); m++) // Calculate Mprod value { Qval = (invector[m] refvector[n][m])*(invector[m] refvector[n][m]); Mprod += toutvector[n][m]*Qval; } Sdelta = (1)*weight[n]*e*Mprod; Sigma[n] = Sigma[n] Sdelta; // Change the Sigma values } } PAGE 78 69 JOptionPane.showMessageDialog(null, outstring, "Zephon: Timing report", JOptionPane.INFORMATION_MESSAGE); Appendix B (continued) epochSSE[Epoch] += e*e; if (epochSSE[Epoch] > maxSSE) maxSSE = epochSSE[Epoch]; // truncating the output at 4 decimal places. Zval = (int)(Zval*10000); Zval = Zval/10000; } if (epochSSE[Epoch] < MINSSE) Epoch = MAXE1; // Break out of training epochSSE[Epoch] = (int)(epochSSE[Epoch]*100); epochSSE[Epoch] = epochSSE[Epoch]/100; //outstring += " + epochSSE[Epoch] +  "; //if (Epoch % 20 == 0) outstring += "\n"; } date = new Date(); outstring += "\nEnd : + sdf2.format(date); //Last computation, for diagram only! Compute(display); // JOptionPane.showMessageDialog(null, refoutput, "Zephon: Reference Vector", JOptionPane.INFORMATION_MESSAGE); Errorgraph.setVisible(false); Errorgraph.Drawgraph(); } public class Neuron extends Object { public Neuron(double rep[][], int rownum) { refoutput += rownum + ": "; for (int n = 0; n < inpnum; n++) { refvector[rownum][n] = rep[rownum][n]; refoutput += refvector[rownum][n] + "; } refoutput += "\n"; } //Internal compute for a single neuron public void Compute(int n) throws IOException { //Calculation formulas for output of neuron R = 0; for (int q = 0; q < inpnum; q++) PAGE 79 70 Appendix B (continued) { R += ((invector[q] refvector[n][q])*(invector[q] refvector[n][q])); } R = Math.sqrt(R); outvector[n] = Math.exp(((R)*(R))/((Sigma[n]*Sigma[n]))); for (int m = 0; m < inpnum; m++) { toutvector[n][m] = (2(2*refvector[n][m]))*outvector[n]; } } public void Compute(int inptrn) throws IOException { Zval = 0; for (int q = 0; q < inpnum; q++) { invector[q] = x[inptrn][q]; } for (int n = 0; n < neunum; n++) { NeuronCell[n].Compute(n); Zval += (outvector[n]*weight[n]); } } PAGE 80 71 Appendix C: Sample Files Trained Level 3 AND files: AND.ipf 0 0 \ 1 1 \ 1 0 \ 0 1 \ AND.itf 0 \ 1 \ 0 \ 0 \ ANDt.rvf 0.7650807090630533 1.0 \ 1.0 0.20697885367991298 \ 0.10443296078870729 1.0 \ 0.355591024503458 1.0989041737714167 \ ANDt.swf 0.4 1.3880162594688192 \ 0.4 0.005015728257667695 \ 0.4 0.12242247314478774 \ 0.4 0.1854413995478036 \ Report file from running the above files: ANDt.rrf **Report file from Network ANDt** Pattern 0: T: 0.0  Z: 3.974412885673406E4 Pattern 1: T: 1.0  Z: 0.29875977082415794 Pattern 2: T: 0.0  Z: 0.001905613025937431 Pattern 3: T: 0.0  Z: 0.2753843358457124 Threshold: 0.29875977082415794; Training SSE: 0.0 (if Z >= Threshold, it is considered a 1) Rounded Z scores: Pattern 0: T: 0.0  Z: 0.0 Pattern 1: T: 1.0  Z: 1.0 Pattern 2: T: 0.0  Z: 0.0 Pattern 3: T: 0.0  Z: 0.0 Correct: 4, Misclassified: 0, %Accuracy: 100.0 Segments of trained traffic files (those us ed for obtaining the M acro1 thesis results): LADAT.ipf 0 0 0 1 1 0 0 1 0.007940522 0 1 1 0 0 0.019212455 0.027810331 0.014350236 1 0.281254947 0.027629741 0.027867407 0.10806974 0 0 0 1 1 0 0 0.010011575 0.028660744 0.366610289 0.267901421 \ 0 0 1 1 1 1 1 1 0.060433615 1 0 0 0 0 0.019212455 0.061555501 0.077500544 1 0.020790547 0.034706559 0.027867407 0.130840153 0 0 0 1 0 0 0 0.010011575 0.028660744 0.083310895 0.002928499 \ 0 1 0 0 0 0 0 0 0.230755121 1 1 0 0 0 0.019212455 0.069081128 0.067711264 0 0.020790547 0.039273381 0.027867407 0.028433111 0 1 0 0 0 0 1 0.010011575 0.028660744 0.698257327 0.654390574 \ 0 1 0 1 0 0 1 0 0.518388152 1 1 0 0 0 0.019212455 0.167085022 0.003824828 0 0.020790547 0.039273381 0.027867407 0.130840153 0 0 0 0 0 0 1 0.010011575 0.028660744 0.060874447 0.070689775 \0 0 0 0 0 0 0 0 0.114467531 0 0 0 0 0 0.019212455 0.485587955 0.067711264 0 PAGE 81 72 Appendix C (continued) 0.020790547 0.039273381 0.027867407 0.346979648 0 0 0 0 0 0 1 0.010011575 0.028660744 0.124674529 0.462849259 \ 0 1 0 0 0 0 0 0 0.089525841 0 0 0 0 1 0.30067265 0.047616865 0.085912019 1 0.020790547 0.051448803 0.065387353 0.130840153 0 0 0 1 0 0 0 0.010011575 0.075112633 0.299193382 0.20265168 \ 0 0 1 1 0 1 1 1 0.127898186 0 1 1 0 0 0.019212455 0.129917115 0.031334825 0 0.020790547 0.039273381 0.027867407 0.130840153 0 0 0 0 0 1 0 0.010011575 0.028660744 0.067623451 0.172842011 \ ... LADAT.itf 0.2 \ 0.7 \ 0.8 \ 0.3 \ 0.1 \ 0.8 \ 0.4 \ 0.2 \ 0.2 \ 0.1 \ 0.4 \ 0.6 \ 0.2 \ 0.3 \ ... LAALL1.rvf 4.965484990048543E10 6.666887053094437E5 7.684886216322444E7 1.0 1.0 1.0 1.0 1.0 0.12793036253118942 9.604617743948704E7 1.0 1.0 1.7697754013871617E5 2.065417866663245E10 0.019212295477742385 0.22137033413551901 0.019255600499939858 1.1162027586523322E8 0.020785316837905132 0.02501093401337006 0.027867407967155335 0.20389487061978853 1.0 9.308339843204276E10 2.878903268491265E10 4.273381292915034E14 4.8662834068183786E17 3.9081772759333534E5 6.321628658834787E7 0.010009458120773794 0.028754769876708875 0.05259669153504068 0.07591030159550989 \ 1.8055384742298853E12 1.1803537335965208E6 5.8398005087401E9 1.0 3.2923712169722443E9 1.0 1.0 1.0 0.1278982962165396 2.0993503531088973E8 1.0 1.0538081181617227E6 3.7924354272978295E9 7.43669809414538E12 0.01921245614915573 0.0937748581990505 0.007297816209911526 1.0 0.020790703986505395 0.039273554801801024 0.027867399571237832 0.13084015063389592 1.030377385752942E12 4.2065804982189316E13 1.0 1.0 1.9466985541901544E14 2.941965632774466E7 3.4734038397182486E10 0.010011574991574087 0.02866098444613393 0.0593187625276591 0.2118086770043505 \ 1.0 1.0 5.6413309863705476E11 9.019210385937662E11 1.0 1.0 1.0 8.076393353804181E16 0.009769849033450453 7.003008908092641E15 1.0 1.0 7.500379134316337E11 1.1095614447219561E10 0.019212455000002862 0.1631868479230969 0.04362844299853249 1.0 0.015635638002750137 0.025207566010307575 0.016052768998924657 0.13084015306436786 1.4362570592867387E13 1.0 3.378580223265652E17 1.04.070262650749548E18 1.0 6.160495228315241E14 PAGE 82 73 Appendix C (continued) 0.010011574999997318 0.02153789803587048 0.07101665398171253 0.2604999239734895 \ ... LAALL.swf 0.3871851681555741 0.16955602936620337 \ 0.39985109929256546 0.17626976563578114 \ 0.4000000136209009 0.02961121820668597 \ 0.4627035915091099 0.4012768757588607 \ 0.38213811339493137 0.31090134134182545 \ 0.3894535368944984 0.12202782854167503 \ 0.6673373699036712 0.1842181363543176 \ 0.399999901020371 0.0431161453152646 \ 0.38020375075833585 0.2561816828973236 \ 0.34595870487782326 0.7347200387778526 \ 0.45225560161449635 0.12650345955148362 \ 0.5113718604607151 0.28348429488390287 \ ... LAALL.rrf *************************************************************** NNet Files: LAALL1.swf & LAALL1.rvf Data Files: BaseVal.ipf & BaseVal.itf Date: 28/08/04 Time: 11:04:17 PM *************************************************************** For each pattern, T is the Target and Z is the predicted value. Pattern 0: T: 0.4  Z: 0.5185153349260078 Pattern 1: T: 0.4  Z: 0.26966354013311056 Pattern 2: T: 0.2  Z: 0.17900644562544815 Pattern 3: T: 0.2  Z: 0.13956177448770138 Pattern 4: T: 0.1  Z: 0.327052150599257 Pattern 5: T: 0.7  Z: 0.16430814991579615 Pattern 6: T: 0.8  Z: 0.5039561275422964 Pattern 7: T: 0.5  Z: 0.3036430006801017 Pattern 8: T: 0.7  Z: 0.3197963135612663 Pattern 9: T: 0.3  Z: 0.5052671476679972 Pattern 10: T: 0.5  Z: 0.36140455728779936 Pattern 11: T: 0.4  Z: 0.3339178336851348 Pattern 12: T: 0.7  Z: 0.25293456145556376 ... 