Before starting any kind of analysis classify the data set as either continuous or attribute, and in many cases it is a blend of both types. Continuous details are characterized by variables that can be measured on a continuous scale like time, temperature, strength, or monetary value. A test is to divide the worth by 50 percent and see if it still makes sense.

Attribute, or discrete, data can be connected with a defined grouping and after that counted. Examples are classifications of positive and negative, location, vendors’ materials, product or process types, and scales of satisfaction like poor, fair, good, and ideal. Once a specific thing is classified it can be counted as well as the frequency of occurrence can be determined.

The next determination to make is if the info is input or output. Output variables are often known as the CTQs (important to quality characteristics) or performance measures. Input variables are what drive the resultant outcomes. We generally characterize a product or service, process, or service delivery outcome (the Y) by some function of the input variables X1,X2,X3,… Xn. The Y's are driven through the X's.

The Y outcomes can be either continuous or discrete data. Samples of continuous Y’s are cycle time, cost, and productivity. Examples of discrete Y’s are delivery performance (late or on time), invoice accuracy (accurate, not accurate), and application errors (wrong address, misspelled name, missing age, etc.).

The X inputs can even be either continuous or discrete. Types of continuous X’s are temperature, pressure, speed, and volume. Types of discrete X’s are process (intake, examination, treatment, and discharge), product type (A, B, C, and D), and vendor material (A, B, C, and D).

Another set of X inputs to always consider would be the stratification factors. These are generally variables that may influence the product, process, or service delivery performance and really should not be overlooked. Whenever we capture these details during data collection we can study it to determine if it is important or otherwise not. Examples are duration of day, day of every week, month of year, season, location, region, or shift.

Now that the inputs can be sorted through the outputs and also the data can be classified as either continuous or discrete selecting the statistical tool to apply boils down to answering the question, “What exactly is it that we wish to know?” This is a summary of common questions and we’ll address every one separately.

Exactly what is the baseline performance? Did the adjustments made to the procedure, product, or service delivery change lives? What are the relationships between the multiple input X’s and the output Y’s? If there are relationships do they really make a significant difference? That’s enough questions to be statistically dangerous so let’s start with tackling them one-by-one.

What is baseline performance? Continuous Data – Plot the info in a time based sequence utilizing an X-MR (individuals and moving range control charts) or subgroup the data employing an Xbar-R (averages and range control charts). The centerline in the chart gives an estimate of the average in the data overtime, thus establishing the baseline. The MR or R charts provide estimates from the variation over time and establish top of the and lower 3 standard deviation control limits for your X or Xbar charts. Develop a Histogram from the data to view a graphic representation in the distribution in the data, test it for normality (p-value needs to be much in excess of .05), and compare it to specifications to evaluate capability.

Minitab Statistical Software Tools are Variables Control Charts, Histograms, Graphical Summary, Normality Test, and Capability Study between and within.

Discrete Data. Plot the information in a time based sequence utilizing a P Chart (percent defective chart), C Chart (count of defects chart), nP Chart (Sample n times percent defective chart), or perhaps a U Chart (defectives per unit chart). The centerline supplies the baseline average performance. The top and lower control limits estimate 3 standard deviations of performance above and beneath the average, which makes up about 99.73% of expected activity over time. You will get a bid of the worst and finest case scenarios before any improvements are administered. Develop a Pareto Chart to look at a distribution in the categories as well as their frequencies of occurrence. In the event the control charts exhibit only normal natural patterns of variation as time passes (only common cause variation, no special causes) the centerline, or average value, establishes the capability.

Minitab Statistical Software Tools are Attributes Control Charts and Pareto Analysis. Did the adjustments made to the procedure, product, or service delivery change lives?

Discrete X – Continuous Y – To evaluate if two group averages (5W-30 vs. Synthetic Oil) impact gasoline consumption, use a T-Test. If you will find potential environmental concerns that may influence the exam results utilize a Paired T-Test. Plot the outcomes on a Boxplot and assess the T statistics with the p-values to create a decision (p-values less than or comparable to .05 signify that a difference exists with at least a 95% confidence that it is true). If there is a positive change choose the group using the best overall average to satisfy the objective.

To test if 2 or more group averages (5W-30, 5W-40, 10W-30, 10W-40, or Synthetic) impact fuel useage use ANOVA (analysis of variance). Randomize the order of the testing to minimize any moment dependent environmental influences on the test results. Plot the outcomes on the Boxplot or Histogram and assess the F statistics with the p-values to create a decision (p-values lower than or similar to .05 signify that a difference exists with a minimum of a 95% confidence that it is true). If there is a difference select the group with all the best overall average to satisfy the goal.

In either of the aforementioned cases to check to find out if there is a difference in the variation brought on by the inputs because they impact the output make use of a Test for Equal Variances (homogeneity of variance). Make use of the p-values to produce a decision (p-values lower than or equal to .05 signify that a difference exists with at least a 95% confidence that it must be true). If you have a positive change select the group using the lowest standard deviation.

Minitab Statistical Software Tools are 2 Sample T-Test, Paired T-Test, ANOVA, and Test for Equal Variances, Boxplot, Histogram, and Graphical Summary. Continuous X – Continuous Y – Plot the input X versus the output Y using a Scatter Plot or if you will find multiple input X variables use a Matrix Plot. The plot supplies a graphical representation in the relationship between the variables. If it would appear that a relationship may exist, between one or more from the X input variables and the output Y variable, conduct a Linear Regression of one input X versus one output Y. Repeat as required for each X – Y relationship.

The Linear Regression Model offers an R2 statistic, an F statistic, and the p-value. To get significant for a single X-Y relationship the R2 needs to be more than .36 (36% from the variation inside the output Y is explained from the observed modifications in the input X), the F needs to be much more than 1, as well as the p-value should be .05 or less.

Minitab Statistical Software Tools are Scatter Plot, Matrix Plot, and Fitted Line Plot.

Discrete X – Discrete Y – In this type of analysis categories, or groups, are in comparison to other categories, or groups. As an example, “Which cruise line had the highest customer satisfaction?” The discrete X variables are (RCI, Carnival, and Princess Cruise Lines). The discrete Y variables are the frequency of responses from passengers on the satisfaction surveys by category (poor, fair, good, very good, and excellent) that connect with their vacation experience.

Conduct a cross tab table analysis, or Chi Square analysis, to evaluate if there have been differences in degrees of satisfaction by passengers dependant on the cruise line they vacationed on. Percentages are used for the evaluation and the Chi Square analysis supplies a p-value to advance quantify whether or not the differences are significant. The general p-value associated with the Chi Square analysis ought to be .05 or less. The variables which have the biggest contribution for the Chi Square statistic drive the observed differences.

Minitab Statistical Software Tools are Table Analysis, Matrix Analysis, and Chi Square Analysis.

Continuous X – Discrete Y – Does the price per gallon of fuel influence consumer satisfaction? The continuous X is the cost per gallon of fuel. The discrete Y is the consumer satisfaction rating (unhappy, indifferent, or happy). Plot the info using Dot Plots stratified on Y. The statistical method is a Logistic Regression. Once again the p-values are utilized to validate that a significant difference either exists, or it doesn’t. P-values that are .05 or less mean that people have a minimum of a 95% confidence that a significant difference exists. Use the most regularly occurring ratings to help make your determination.

Minitab Statistical Software Tools are Dot Plots stratified on Y and Logistic Regression Analysis. Are there any relationships involving the multiple input X’s and the output Y’s? If you can find relationships do they make a difference?

Continuous X – Continuous Y – The graphical analysis is actually a Matrix Scatter Plot where multiple input X’s can be evaluated against the output Y characteristic. The statistical analysis method is multiple regression. Measure the scatter plots to look for relationships involving the X input variables and the output Y. Also, search for multicolinearity where one input X variable is correlated with another input X variable. This can be analogous to double dipping so we identify those conflicting inputs and systematically remove them from the model.

Multiple regression is really a powerful tool, but requires proceeding with caution. Run the model with variables included then review the T statistics and F statistics to identify the first set of insignificant variables to eliminate from the model. Throughout the second iteration in the regression model turn on the variance inflation factors, or VIFs, which are used to quantify potential multicolinearity issues 5 to 10 are issues). Review the Matrix Plot to identify X’s related to other X’s. Remove the variables using the high VIFs as well as the largest p-values, but ihtujy remove one of the related X variables within a questionable pair. Assess the remaining p-values and take away variables with large p-values from the model. Don’t be surprised if the process requires more iterations.

When the multiple regression model is finalized all VIFs will likely be less than 5 and all of p-values will likely be under .05. The R2 value needs to be 90% or greater. This is a significant model and the regression equation can now be utilized for making predictions as long as we maintain the input variables within the min and max range values that were employed to create the model.

Minitab Statistical Software Tools are Regression Analysis, Step Wise Regression Analysis, Scatter Plots, Matrix Plots, Fitted Line Plots, Graphical Summary, and Histograms.

Discrete X and Continuous X – Continuous Y

This situation requires the use of designed experiments. Discrete and continuous X’s can be used as the input variables, but the settings to them are predetermined in the appearance of the experiment. The analysis technique is ANOVA which was mentioned before.

Is an example. The objective would be to reduce the number of unpopped kernels of popping corn in a bag of popped pop corn (the output Y). Discrete X’s may be the make of popping corn, kind of oil, and form of the popping vessel. Continuous X’s might be level of oil, amount of popping corn, cooking time, and cooking temperature. Specific settings for each one of the input X’s are selected and incorporated into the statistical experiment.