This section opens with some generic comments about the empirical challenges associated with making valid inferences regarding hypothesized gaps when financing decisions are subject to two types of confounding influences or factors. The first set of factors includes those that are usually considered to be legitimate determinants of credit or investment decisions. The second set of factors consists of those across which gaps have been hypothesized. Factors in the latter category are problematic because they may be regarded as less than legitimate determinants of creditworthiness or investment readiness. Appropriate gap analyses must adjust for the effect of these confounding factors, whose effect can be both serious and very subtle. In order to illustrate this crucial issue, the introductory section will conclude with a simple example based on fictitious data that illustrates the pitfalls involved in performing naïve "gap" analyses that ignore the effect of confounding factors.
The methodological problem posed by the gap analysis is to determine if a particular variable, an (illegitimate) rationing criterion, affects financing outcomes or terms of financing. However, the impact of such variables cannot be measured directly (for example, through a simple cross tab) as there are numerous other variables that can legitimately affect financing-related outcomes (for example credit/equity worthiness variables), which we refer to as covariates. Thus a simple breakdown across gap categories will be biased if the values of the "legitimate" variables also differ initially across categories of the rationing variable, as is usually the case. For example, sector of firm (arguably a legitimate determinant of creditworthiness) correlates with gender of owner (a potential rationing variable); hence, a simple crosstab of loan turndown frequencies across gender would be confounded by sectoral patterns. The Appendix to this paper presents a fictitious example of how this confounding effect may lead to embarrassing outcomes. Thus the problem at hand is to develop appropriate methods for removing this bias so that a plausible causal inference can be made.Footnote 19
For each source of capital considered here, two aspects of the financing are of particular interest: the finance granting decision and the terms on which financing is advanced. The former is the suppliers' decision of whether or not to advance financing (for example, grant a loan application or not; provide venture capital or not; etc.). The latter aspect relates to the terms on which financing is advanced (for example, interest rates and collateral requirements etc. on loans; terms of venture capital contract; etc.).
To examine the finance granting decision (for example, loan approval or turndown) a logistic regression approach is recommended. This type of analysis uses a vector of linear predictors that includes standard determinants of eligibility for capital (call these the x's) and one or more possible rationing criteria that specify the hypothesized gap or gaps (call these the z's). Alternative rationing criteria include the gender of the owner, whether or not the company is knowledge-based, size, etc. The strategy for a given rationing criterion, zj, is to determine whether or not zj makes a significant contribution to the regression model, over and above the effect of the x's. This will be signaled by a significant logistic regression coefficient for zj. The logistic regression can be represented as
Logit (Loan Turndown)
= ƒ {Rationing Criteria (z)} + ƒ {Moderating Variables (x)}
where the logit is the logarithm of the ratio of the probability of a turndown to the probability of a successful loan application and f{·} denotes a linear combination of variables.
Company size as a rationing variable is of particular interest methodologically (as well as substantively) as it is also one of the primary stratification variables used in the demand-side baseline survey design. Since there are far more small companies than large companies, the sampling fractions will vary across size strata, with the smallest sampling fraction being assigned to the stratum containing small companies (details to be verified). Thus units (companies) will be selected into the sample with unequal probabilities, so that the units in the sample will have associated survey weights that vary across strata. To the extent that Statistics Canada uses post-stratification to improve survey accuracy, and / or weighting adjustments to correct for non-response, additional variation will be incorporated into the survey weights.
Whenever unequal survey weights are encountered, the analyst must decide whether to use the weights and perform a design-based weighted analysis, or to ignore the weights and perform a model-based analysis. There is an extensive literature on this (see the reviews by Thomas, 1993; Korn and Graubard, 1999; Lohr, 1999), and there are advantages and disadvantages to both approaches.
The consensus now is that both analyses, weighted and unweighted, should be carried out and compared, in order to assess the magnitude of any bias in the unweighted analysis. Korn and Graubard (1999) also describe a method for assessing the degree of relative inefficiency introduced by the weighted analysis. The choice of which strategy to use (weighted or unweighted) will be made after due consideration of the bias and efficiency trade-off. All variance estimation for the design-based analysis must follow design principles — the standard "weighted" analyses available in SPSS and SAS, for example, will give incorrect variances in the weighted case.
It will be particularly important to perform extensive regression diagnostics and studies of model fit. It is always easier to perform such investigations for multiple linear regression rather than for multiple logistic regression. However, strategies for examining fit have been developed in the latter case also (see, for example, Hosmer and Lemeshow, 1989). Some of these been adapted to the logistic case, as described by Korn and Graubard (1999). Goodness of fit tables based on weighted data can be constructed to assess model fit. Partial residual plots can be constructed to examine residual variation relative to single variables, useful for assessing linearity and for identifying influential observations. Interactions among the x's and between the x's and the z's must also be systematically explored, as the "gap" under examination may not be uniform but may vary depending on legitimate determinants of credit/equity worthiness.
In summary, it is critical that a thorough search be made for the best fitting model, since the determination of a gap will depend on the presence in the model of a significant coefficient for gap variable zj. If this significant coefficient could be rendered insignificant through the addition of other (non - z) variables and / or interactions to the model, this case for the gap would disappear. Given the public policy implications, we must ensure (within the data limitations of the survey) that all plausible alternatives (Campbell and Stanley, 1966) have been accounted for.
For this discussion, we will use terms of credit as an example; however, the discussion applies generally to each of the various types of financing investigated under the DFI.
The terms of credit to be investigated include: interest rate on the loan; collateral requirements; personal guarantees; and documentation requirements. Assuming that these can all be represented in terms of ratio or interval scaled variables, a strategy similar to that described above can be used to investigate the existence of specific gaps. Instead of using logistic regression, the vector of measures of credit terms can be modeled using multivariate multiple regression. With the possible rationing variables (gaps) represented by categorical variables, and the moderating variables representing legitimate determinant of creditworthiness represented by continuous and / or categorical variables, this analysis amounts to multivariate analysis of covariance (MANCOVA). If a "gap" variable is significant after adjustment for the effect of covariates, then this will constitute evidence of the existence of the postulated gap in the terms of credit. The MANCOVA model may be conceptualized as follows.
The issue of weighted versus unweighted analyses is again relevant. An unweighted analysis, if valid, can proceed using the classical MANCOVA approach, the "gap" test comprising an adjusted test of a main effect. The usual multivariate tests (Hotelling's, Wilks) can be applied, subject to an appropriate assessment of the model assumptions. In addition to the model exploration described above, the assumptions in this case include equality of covariance matrices, avoidance of extreme non-normality, as well as testing for parallelism of covariate slopes in the two (or more) "gap" categories, e.g., size strata. A weighted design-based analysis will accomplish similar goals, though the formulation will appear to be quite different, and some of the model-based assumptions (e.g. covariance equality, slope equality) will not be relevant. It is interesting to note that design-based MANCOVA analyses have not been explicitly described in the literature, though they can be effected using fairly standard techniques, as described below.
In the weighted case, it is convenient to represent the MANCOVA as a system of correlated multiple regressions, one for each of the individual terms of credit. The regression coefficients for the "gap" variables (one or more) and the covariates (the x's) will in this case correspond to a MANCOVA model in which the assumption of equality of covariances and covariate slopes are completely relaxed. Hypothesis tests in the weighted case (to identify specific gaps, and to explore model adequacy) will be constructed using Wald tests; examples of Wald tests for a variety of regression situations are described by Korn and Graubard (1999). It should be noted that the correlations among the various terms of credit can be accounted for in these hypothesis tests. This will require that design-based estimates of correlations include correlations across equations as well as between different covariates. Such correlations can be readily obtained if Statistics Canada variance estimation is based on a replication strategy (jackknifing or bootstrapping). It was noted in the previous section that weighted and unweighted analyses should be compared in order to assess possible biases. Again, this would be greatly facilitated if replicate estimates of variance can be obtained.
Methods for Utilizing Categorical Data
The logistic regression and MANCOVA analyses described above incorporate continuously measured variables, for the covariates in the logistic regression case, and for the dependent variables in the MANCOVA case. For many "gap" analyses, however, both dependent variables and covariates (legitimate covariates as well as other rationing criteria) will be measured as categorical variables, in which case it may be more natural to proceed using analysis techniques specifically designed for categorical data. A brief outline of some relevant techniques is provided below.
When all explanatory variables are categorical, logistic regression can be represented and estimated through the framework of loglinear models, an approach that allows for the representation of ordinal as well as nominal explanatory variables (see, for example, Agresti, 1990, Chapter 8). As an illustration, consider a three way contingency table featuring a two category response variable R and two explanatory variables (covariates, rationing variables) X and Z, having J and K categories, respectively. The extension to more explanatory variables is routine. A loglinear model can be written for either the counts or the proportion of observations falling in each cell of the table. It is more convenient to use the proportions, since for survey weighted data, the raw counts will not be meaningful, while the proportions can be unbiasedly estimated.
The formulation can be modified to account for ordinal explanatory variables. In principle, it does not differ from the originally-described logistic regression approach, which can incorporate discrete as well as continuous explanatory variables. However, most logistic regression software requires that the user provide the appropriate parameterization of the discrete explanatory variables, and if ordinal parametrizations and / or interactions between different explanatory variables are to be explored, this can become tedious. Software focused specifically on discrete data via the loglinear model paradigm is often far more convenient to use. Parameter estimation is usually done via maximum likelihood (ML). However, for weighted survey data, though ML results in design consistent estimates, it results in biased estimates of variances. Appropriate variance estimation techniques are available that account for the effect of the survey design (see, for example, Binder, 1983; Roberts, Rao and Kumar, 1987; Rao and Thomas, 1988).
As discussed above, a MANCOVA analysis can be decomposed into a set of multiple regression analyses, a step that is particularly convenient if one potential "gap" variable is to be adjusted for the effects of other postulated "gap" variables, in addition to the legitimate covariates and rationing variables. This case differs from the above in that the categorical response variable(s) will now have more than two categories, i.e., multi-category logit models will be required to provide the categorical analogue of multiple regression. In many cases, the loglinear model framework can be used to generate such models. A variety of multi-category models is described by a number of authors, with an excellent account provided by Agresti (1990, Chapter 9). Besides extensions of the logit model described above, Agesti describes a class of "mean response" models, in which linear models are used to directly model functions of the response proportions (Agesti , Section 9.6). These models are more difficult to estimate using Maximum Likelihood approaches, though methods exist, but they can be readily estimated using the method of weighted least squares. This approach is convenient as it can be readily adapted for use with weighted survey data, as described by Koch, Freeman and Freeman (1975).
The approaches described above are similar in that the search for a "gap" is dependent on a model, irrespective of the type of estimation and inference used (unweighted model-based or weighted design-based). It is for this reason that the need for model diagnostics was stressed – selection of an incomplete model might result in the erroneous identification of a gap (or no gap).
There are alternative techniques that place less reliance on a model linking loan success or terms of credit to credit worthiness and possible rationing (gap) criteria. The general principle behind the available methods is to match companies in different "gap" categories (e.g., male versus female company principal) on the basis of the moderating variables or covariates, namely the legitimate measures of credit worthiness. This balances the covariates across the gap categories, so that a simple comparison of loan outcome or credit terms between the balanced groups will comprise an approximately unbiased assessment of the gap.
For a two-category gap variable (e.g., male - female), one method for effecting this covariate balancing is to categorize units (companies) on the basis of their propensity scores (Rosenbaum and Rubin, 1984). A propensity score is the probability that a particular unit belongs to a given reference category (e.g., the male group). Rosenbaum and Rubin (1983, 1984) used logistic regression to estimate these propensities and placed all units in one of a set of equally spaced propensity groups (0 - 0.1, 0.1 - 0.2, etc,). To compare terms of credit, for example, the average interest rate for males and females in each propensity group would be calculated. The unweighted mean of the differences between corresponding propensity group means would then provide an overall male – female interest rate contrast, balanced for the effect of initial differences in credit worthiness. For use in this study, this method would have to be adapted to accommodate survey weights, a routine task.
Other techniques for adjusting for the effects of covariate imbalance based on pair-wise matching have been suggested in the literature. In particular, matching can be based on the propensity scores themselves (Rubin, 1979) or on Mahalanobis distance (Rubin, 1980). An advantage of the Mahalanobis distance method is that it is essentially model free, unlike the propensity score method, though the latter is easier to implement. However, both these methods have a disadvantage in the current context, in that they are designed for unweighted data. It is not immediately clear how pair-wise matching techniques should be adapted to the complex survey case, where different cases, having different weights, represent different numbers of population units. For this reason, pair-wise matching methods are not recommended in this study.
The Market for Commercial Loans
Given the above research challenges, a two-stage gap analysis framework is suggested:
The research literature has shown that lenders' decision to grant a loan, or not, is based on a variety of factors. It is proposed that the approach for testing of credit gaps be vested in the following framework. According to this model, determinants of credit outcomes such as size of firm, efficiency of firm, owner(s) skills and experience determine credit; however, in the presence of a gap, the gap dimension (e.g., size, gender, etc.) acts as a filter through which credit determinants may differentially affect the credit outcome.
Therefore, a multivariate analytical framework is mandated. Both approaches would therefore employ a common set of potential rationing criteria and control variables. Based on the findings from the literature reviewed earlier in this study, these are listed in Table 9.
Determinants of Loan Turndowns
The sub-sample of interest for this analysis comprises those firms that have reported applying for a commercial loan during the period investigated by the questionnaire. The dependent variable would be either a binomial measure of the loan decision outcome (whether or not the loan had been turned down) or multinomial (whether the loan had been turned down, approved in full, approved in part). To comply with the statistical assumptions, the loan decision outcome would need to be transformed using a logit-type transformation. Potential drivers of the loan decision include such factors as sector, skills of owner(s), age of firm, etc. (see Wynant and Hatch, 1991; CBA, 1998; and Haines and Riding, 1994; among others).
Several statistical approaches are possible, especially with such large samples. A complication of the analysis is the likelihood that potential rationing criteria correlate with moderating variables (for example, gender with sector) and that moderating variables may be correlated among themselves (for example, size and age of firm). This collinearity condition may hamper direct interpretation of regression coefficients.
For the loan granting decision, multivariate logistic regression models with a binary dependent variable (loan granted or not) is proposed.
Logit (Loan Turndown) = ƒ {Rationing Criterion} + ƒ {Moderating Variables}
Candidate variables for this regression are listed in Table 9. The outcome variable would be a logit transformation of whether or not the firm's application for a commercial loan had been turned down. Moderating variables here would include those factors that are generally accepted determinants of the credit decision as listed in Table 9. The rationing criteria would be the particular dimensions suspected of being the basis of a gap or imperfection.
Statistical modeling of terms of credit may be undertaken in several ways. One approach is based on that of Toivanen and Cressy (2000) who used a system of equations based on a theoretical model of the bank-SME relationship. Another approach is based on MANCOVA, as described above.
In the MANCOVA approach, a vector of outcome variables (the set of measures of credit terms available from the data (will depend on quality of survey data). This should include interest rate, collateral requirements, fees, documentation requirements (as available). MANCOVA then identifies the extent to which this vector of outcomes is statistically related to one or more fixed factors (here, the potential bases for credit rationing) while controlling for the covariates (in this instance, plausible determinants of terms of credit). These variables are detailed in Table 9. MANCOVA provides a relatively robust means of investigating the types of relationships that are of interest here.
In the simultaneous equations approach, each term of credit becomes the dependent variable of a single equation and related to rationing criteria and control variables as regressors. In addition, the dependent variable from one equation may be used as a regressor in another equation in the system. For example, separate equations might be used to estimate the interest rate and the level of collateral requested; however, it is also reasonable to expect the amount of collateral requested as a condition of the loan to be among the determinants of the interest rate.
Extensions of the Model to Leasing
The framework outlined above is specified in terms of the commercial loan market. However, it is also appropriate (with virtually no modification of the independent variables) for application to the leasing market. In this case, the two primary changes would be to the data selected for analysis and to the outcome variables. Leasing is conceptually very similar to debt: both financing approaches involve a contractual promise to make periodic payments following provision of capital or use of an asset. Thus the analytical approach would be virtually identical.
The data used for this analysis would comprise those businesses that applied for a lease. As before, logistic regression would be used to model the lease turndown / acceptance decision and candidate control and rationing variables are listed in Table 9. In addition, binomial variables that describe the category of asset being leased should also be introduced among the control variables.
For those firms that were successful in their lease applications, the terms of leasing would be modeled exactly as for terms of commercial loans (descriptions of terms of lease expressed as functions of the rationing and control variables listed in Table 9, the latter augmented by binomial variables describing the category of assets being leased.
A Framework for Analysis of Gaps in the Canadian Markets for Informal and Venture Capital
The data to be collected under the terms of the SME FDI may help inform further the debate about the level of investment collectively made by private investors and venture capital firms. The findings from the sample, properly weighted and scaled up to the Canadian population of firms would provide a useful starting point for this estimate.
In addition, the research literature has consistently reported that informal investors focus on early stage growth-oriented businesses. This finding that characterizes the academic literature appears to be at variance with respect the gaps postulated by the BDC. This may be resolved by breakdowns by stage of business, size of firm, and growth record of firms that have reported receipt of informal capital. If the BDC position is correct, early stage firms, small firms, and growth firms ought not received informal capital any more frequently than, respectively, later stage enterprises, large firms, and firms that have not grown.
Ideally, the research framework suggested to examine the gaps postulated would be very similar to that described above for the debt market.Footnote 20 However, it is understood that the number of respondents who have sought informal and venture capital may be small. The reliability of logistic regression or MANCOVA will depend ultimately on the number of respondents who have sought equity capital and the quality of these data. This makes model specification the more important, and the more challenging. Complicating the situation is the finding that, venture and informal capital generally base their investment decisions on their assessments of the future state of the enterprise and on its potential. Generally speaking, these assessments are not observable from surveys of the current status of businesses.
To investigate gaps posited by the BDC, it may be therefore be best to undertake additional specialized studies. One such study is suggested by the BDC contention that the Canadian venture capital market includes an institutional gap that reflects a lack of involvement of pension funds, mutual funds, and other such institutions. This cannot be tested from the baseline SME FDI data in any case and a specialized study would be mandated to resolve this question.
A second specialized study could employ findings from the baseline survey. The baseline survey could provide the identities of firms that (a) have sought informal or venture capital successfully and (b) other firms that were not successful in obtaining such financing. The baseline survey would provide extremely useful business demographic data on both categories of firms. The specialized study recommended here would involve returning to a select number of both categories of firms to develop case histories of their respective quests for risk capital. Such histories could provide in-depth quantitative and qualitative data of:
Footnote 19 This problem differs slightly from many bias reduction studies in that many of the legitimate credit/equity worthiness variables can be enumerated and measured. This is in contrast to many bias reduction situations in which there is always the possibility that unknown and unmeasured covariates remain to bias the group comparison after the effects of all known covariates have been removed. Nevertheless, even in the present case, if a particular "gap" effect is identified after due adjustment for the covariates, this still does not necessarily constitute proof that the postulated "gap" variable is causally related to the observed differences in financing outcomes. For example, one can conclude that it is not imbalances in the legitimate covariates (i.e., the credit/equity worthiness measures) that are responsible for the gap. However, there may be other rationing criteria (perhaps unmeasured) that are correlated with the outcome financing measures, and that are causing the observed effects. In other words, there may remain what Campbell and Stanley (1966) called "plausible alternative hypotheses" to the postulated source of the measured gap. If variables representing such plausible alternatives have been measured, their effect can be explored, which will then exhaust all possibilities for statistical bias reduction. If a statistically significant gap remains, then the validity of the causal inference will rest on the care taken by the survey designers to include and measure all relevant variables.
Footnote 20 As before, to model access to the various forms of equity capital, the ideal approach would be to use a multivariate analysis based on logistic regression. The dependent variable would measure whether or not the firm received private investment (alternatively, venture capital). The rationing criterion would be the particular dimension hypothesized as being the basis of a gap or imperfection (hypothesized dimensions include size, stage, KBI orientation, etc.). Moderating variables here would include those factors that are generally accepted determinants of equity investors' decision criteria. Based on the previous works by Mason and his colleagues (1996, 2001), Feeney, Haines and Riding (1999) (among others) these include measures of owner(s)' skills and experience, industry and geographic sector variables, growth orientation of the firm, and demographic attributes of the business.