# Introduction oing Concern Prediction (GCP) is an important element in investor's decision-making. Rapid advances in technology, vast environmental changes and increasing competition has affected the security of investment. On the other hand, based on the requirements of Statement on Auditing Standards (SAS) No.59 on every audit the auditor should evaluate whether substantial doubt exists about the firm's ability to continue as a going concern (AICPA, 1988). However, SAS 59 contained the relevant criticized guidelines because of deeply subjective, general, ambiguous (Koh & Killough 1988) and, consequently, assessment of GCP sometimes is a tough process and the complexity of GCP has led the development of several models by employing a multiple financial and non-financial variables that might be signifying going concern opinion for auditor (Martens et al, 2008). Early studies of GCP developed by applying statistical techniques such as multiple discriminant analysis and Logit, probit (McKee, 1976;Kida, 1980;Koh, 1987;Menon & Schwartz, 1987;Koh & Brown, 1991). In recent years, data mining has established, developed and began to appear and grow promptly in the financial area and constructed a new approach for the deep research. Data mining technique via utilizing a large number financial data can be extracting, valuable and unknown knowledge dynamically. Using data mining techniques several research have been conducted in GCP area and the findings indicate that these techniques are able to predict the going concern status of firms and accounting data are useful in GCP (Brabazon & Keenan, 2004;Koh & Kee Low, 2004;Martens et al, 2008;Mokhatab et al., 2011). Nowadays these methods because of the restrictive assumptions of statistical techniques (such as normality, linearity and independence of variables) are used less. This research has applied Classification and Regression Tree (CART) and Naïve Bayes Bayesian Network for GCP. Results from this study will help a manager to keep track of company's performance and to identify significant problems and take efficient measure to reduce the coincidence of failure. In addition, this model helps lenders and other stakeholders to have a clear and comprehensive picture of the firm's prospective status. In addition, auditor can use the survey results in the final stages of the audit engagement as a qualitycontrol device or as a benchmark in auditor judgment. Particularly, the GCP model in this paper can be applied for auditors to assess potential clients and as a means to identify non-going concern firms that might require further consideration. # II. # Research Development The data set is composed of 146 Iranian manufacturing companies including 73 matched companies in bankrupt firms and firms with going concern status that all of them were or still are listed in the Tehran Stock Exchange (TSE) from 2001-2011. As you can see in Table 1, the 42 proposed variables used in this study are shown. After data collection, this paper applied process of future selection by T-test and Stepwise Discriminant Analysis (SDA) at a significant level of 0.05 and selected final variables. The potential advantages of feature selection are facilitating data visualization and understandable data, reducing the measurement and storage requirements (Ashoori & Mohammadi, 2011). Another purpose of these tests is to determine the financial ratios that can distinguish between the two companies (going concern and nongoing concern status). The result of SDA process is shown in Table 2. The ratios that are entered in the model are total liabilities to total assets (?? 9 ), Retained earnings to total assets (?? 31 ), Operational income to sales (?? 36 ) and Net income to total assets (?? 34 ). After extraction of financial ratios, a model was constructed that explained as a discriminant model in below: Z= -0.374 X9+ 0.293 X31+ 0.359 X36+ 0.384 X34 (1) CART, methodology was popularized in 80s by Breiman et al. (1984). In the area of GCP, the goal of the analysis via CART is to obtain a set of if-then rules with acceptable accuracy that determine what companies will have going concern or not in the future. Furthermore, reasons for selecting CART are that is nonparametric and can easily handle outliers. It is flexible and has an ability to adjust in time (Timofeev 2004). In order to obtain the best predictive accuracy, CART is built to minimize the misclassification cost, which takes both variance, and misclassification rates into consideration. It is a significant step to choose the splits on the features that are employed to predict membership in corresponding class of firms. CART computational detail includes itself in finding the best split rules in order to make an uncomplicated, informative and accurate tree. The CART regards all variables as independent in the calculations of split with the training data set. The ??th samples is expressed as (?? 1 ?? , ?? 2 ?? , ?,?? ?? ?? , ?, ?? ?? ), where ?? ?? ?? is the value of the ??th sample firm on the ??th feature and the label value of the sample is ?? ?? . Since CART is a binary recursive partitioning method that every leaf of the data splits to two sub-leaves, for classification problem the values of ?? ?? are binary, e.g., -1 or 1. In the process of splitting, if a feature value ?? ?? ?? ? ?? ? is met, CART follows the rule that a sample goes right, otherwise it goes left. Split at each node will occur only when the split can go to greatest improvement in accuracy of prediction. Specific types of node impurity measure that Breiman et al. (1984) proposed to apply Gini index as the criteria used in order to reduce the impurity in splitting for classification, since it can be estimated more rapidly and be readily extended to include symmetries costs can measure this. In the classification problem of GCP, the Gini index of impurity of a node can be signified as follows (Breiman et al., 1984): ( ) 2 1 ? ? = j j gini c p I Where ??(?? ?? ) indicates the relative frequency of the first class in the node. The Gini index reaches a value of zero when only one class is obtained at a node.It means that if all cases in a node belong to the same class, the Gini index will be zero (Li, Sun & Wu, 2010). CART applied backward pruning algorithms. Pruning will be necessary to build smaller tree models that perform better on new data and not just on the training data. CART uses pruning and selecting in each node in the tree when the tree is fit (Soni, 2010). As the classification or regression tree is constructed, it can be used for classification of new data. The output of this stage is an assigned class or response value to each of the new observations. By set of questions in the tree, each of the new observations will get to one of the terminal nodes of the tree. A new observation is assigned with the dominating class/ response value of b) The Method of Naïve Bayes Bayesian Network (NBBN) Bayes networks are a powerful tool for relationships between a set of variables and they are a suitable tool for dealing with uncertainty conditions in expert systems (Markov, 2007). The purpose of Bayes network is to establish a model that can classify companies correctly using financial ratios. A NBBN is based on Bayes' rule that is expressed as follows: In problem solving of going concern, P(A) ??(??/??)= ??(??/??) ??(??) ??(??)(2) shows the percentage of companies with going concern status and P(B) indicates the share of each of the independent variables are used for GCP and P(A/B) is probability of going concern status during one year ahead. An example of a NBBN can be seen in Figure 1. In this figure A is dependent variable and ?? 1 , ?? 2 , ?? 3 , and ?? 4 are independent variables (Sun & Shenoy, 2007). # Experimental Results The proposed CART and NBBN models are implemented by using MATLAB 7.6.They are results from the 10 testing data sets by using 10-fold cross validation (See Table 3 As shown in Table 5, the result of McNemar test at 5% level indicates that there are significant differences between the two models in GCP. According to Table 6, Type I error is the probability that a company with non going concern status to be classified as a company with going concern status and Type II error is the probability that a company with going concern status to be classified as a company with non going concern status. Costs related to these two types of errors are very different. Costs resulting from incorrectly classifying a company with non-going concern as a company with going concern status (Type I error) is much larger than the Type II error (incorrectly classifying a company with going concern as a company with non-going concern status). In holdout data type I and II error are also equal to 2.5 and 0 percent in CART model and 22.64 and 22.65 percent for obtained model by NBBN. # Conclusion The current study demonstrated feasibility of applying CART and NBBN to predict going concern status with data collected from Iran. This paper considered a set of features that include 42 variables proposed in prior literature dealing with financial status prediction models in Iran and applied SDA to identify potential variables for GCP model and finally four financial ratios were selected and constructed CART and NBBN GCP models based on selected features. Based on the conclusions, the empirical tests show that CART and NBBN models have achieved 98.62 and 75.55 percent accuracy rates for training and holdout data, respectively. Moreover, McNemar's test results indicate that there are significant differences between the two models in predicting of going concern. In summary, obtained results from this research from 146 companies of Iran signify that: CART model has appropriate ability for GCP of firms. Further, this research empirically tested future selection using statistical technique that data mining algorithms can be used for future research. ![Data Mining Approach to Prediction of going Concern using Classification and Regression Tree (CART)](image-2.png "D") 1![Figure 1 : NBBN for predicting of going concern](image-3.png "Figure 1 :") 12013earYVolume XIII Issue III Version I( )Global Journal of Management and Business ResearchDData Mining Approach to Prediction of going Concern using Classification and Regression Tree (CART) 2#Definition of variablesMeans of Group 1Means of Group 0Sig level#Definition of variablesMeans of Group 1Means of Group 0Sig level1EBIT/TA0.180.050.002LTD/SE0.200.560.063RE/SC0.650.020.004MVE/TL1.400.660.005MVE/SE2.422.570.226MVE/TA0.770.480.007Ca/TA0.050.030.008Size(logTA)5.255.230.839TL/TA*0.670.800.0010 CL/SE2.274.760.0011CL/TL0.860.850.9412 (Ca+STI)/CL0.110.050.0013(R+Inv)/TA0.570.570.8814 R/S0.530.400.1015R/Inv1.181.000.9316 SE/TL0.630.320.0017SE/TA0.350.220.0018 CA/CL1.311.070.0019QA/CL0.700.570.0020 QA/TA0.370.360.7321FA/(SE+LTD)0.600.910.0122 FA/TA0.220.240.6323CA/TA0.700.680.6624 Ca/CL0.090.040.0025IE/GP-0.02-1.210.4826 S/Ca35.3044.800.1127S/TA0.930.700.0028 WC/TA0.130.000.0029PIC/SE0.530.860.0030 S/WC2.871.730.9631RE/TA*0.08-0.030.0032 NI/SE0.42-0.030.0033NI/S0.16-0.020.0034 NI/TA*0.130.000.0035S/CA1.341.070.0036 OI/S*0.200.060.0037OI/TA0.170.030.0038 EBIT/IE-5.21-0.450.0539EBIT/S0.520.100.0040 GP/S0.270.150.0041S/SE3.324.680.0542 S/FA6.296.440.33Group 1: going concern firms and Group 0: non-going concern firms* : Final variables selected by SDACA: Current assetsNI: Net incomeCa: CashOI: Operational incomeCL: Current liabilitiesQA: Quick assetsPIC: Paid in capitalR: ReceivablesEBIT: Earnings before interest & taxesRE: Retained earningsFA: Fixed assetsS: SalesGP: Gross profitSC: Stock capitalIE: Interest expensesSE: Shareholders' equityInv: InventorySTI: Short term investmentsLA : Liquid assetsTA: Total assetsLTD: Long term debtTL: Total liabilitiesMVE: Marked value of equityWC: Working capitalStepTolerance F to Remove Wilks' Lambda1Net income to total assets1.00100.772Net income to total assets0.9456.240.75Total liabilities to total assets0.949.070.553Net income to total assets0.518.620.52Total liabilities to total assets0.9111.100.53Operational income to sales0.556.110.514Net income to total assets0.484.750.49Total liabilities to total assets0.908.550.50Operational income to sales0.544.570.49Retained earnings to total assets0.774.370.49 3and NBBN modelCARTNBBNFoldTrainingHold-outTrainingHold-outdatadatadatadata1100.00100.00100.0080.002100.00100.00100.0080.003100.00100.00100.0066.67493.3399.23100.0066.675100.00100.00100.0080.00692.86100.00100.0085.717100.00100.00100.0064.298100.00100.00100.0078.579100.00100.00100.0082.2110100.00100.00100.0071.43Min92.8699.23100.0064.29Max100.00100.00100.0085.71Median Variance100.00 9.28100.00 0.07100.00 0.0085.71 61.99Mean98.6299.92100.0075.55 4FoldCont RuleHeight Tree1322323324215326217328329321032 5Methods NBBNCART-3.536 (0.011)D 6Real statusPredictionNon goingGoing concernconcern statusstatus1-P 22 (Type I error)P 11P 221-P 11 (Type II error) © 2013 Global Journals Inc. (US) * The auditor's consideration of an entity's ability to continue in existence Aicpa Statement on auditing standards 1998 * Compare failure prediction models based on feature selection technique: empirical case from Iran SAshoori SMohammadi Procedia Computer Science 3 2011 * A hybrid genetic model for the prediction of corporate failure ABrabazon BKeenan Computational Management Science 2004 Springer-Verlag * Classification and regression trees LBreiman JFriedman ROlshen CStone 1984 International Group * An investigation into auditors, continuity and related qualification judgments TKida Journal of Accounting Research 18 2 1980 * Prediction of going-concern status: A probit model for the auditors HKoh 1987 Virginia Polytechnic Institute and State University Ph.D. dissertation * Probit prediction of going and non-going concerns HKoh RBrown Managerial Auditing Journal 6 3 1991 * Proposed statement on auditing standards: the auditor's consideration of an entity's ability to continue existence HCKoh LNKillough Virginia Accountant Quarterly 40 2 1988 * Going concern prediction using data mining techniques HCKoh CKee Low Managerial Auditing Journal 19 3 2004 * Predicting business failure using classification and regression tree: An empirical comparison with popular classical statistical methods and top classification mining methods HLi JSun JWu Expert Systems with Applications 37 2010 * Probabilistic reasoning with naïve bayes and Bayesian networks, PhD dissertation ZMarkov 2007 Central Connecticut State University * Predicting going concern opinion with data mining DMartens LBruyneseels BBaesens MWillekens JVanthienen Decision Support Systems 45 2008 * Discriminant prediction of going concern status: A model for auditors TMckee Selected Papers of the AAA Annual Meeting 1976 * An empirical investigation of audit qualification decisions in the presence of going concern uncertainties KMenon KSchwartz Contemporary Accounting Research 3 2 1987 * Financial health prediction models using artificial neural networks, genetic algorithm and multivariate discriminant analysis: Iranian evidence FMokhatab Rafiei SMManzari SBostanian Expert Systems with Applications 38 2011 * Implementation of multivariate data set by CART algorithm SSoni Journal of Information Technology and Knowledge Management 2 2 2010 * Using Bayesian Networks for Bankruptcy Prediction: Some LSun PShenoy 2007