Tuesday, March 17, 2015

Significance and Chi-Squared Testing

Part 1

*note: for the ‘z/t value’ section, the +/- indicator implies that the critical value is both above and below the mean (o) point.  The values that do not have +/- in front of it will only have one critical value, and could be either in the + range or – range, but not both.

2. The data presented to us in this problem is trying to test if there is a difference between the populations of three Invasive insects in Buck County in comparison to estimated population numbers per field in the entire county.  The null hypotheses is that there is no difference between the invasive species population in the 50 fields sampled from Buck county, and estimated values.  The alternative hypotheses is that there is a difference in these populations.     After calculating the Z scores of the sampled bug populations from the 50 selected fields and comparing their placement in comparison to the 1.96 or -1.96 critical value which was derived from using a two tailed test with 95% confidence, it was concluded upon that null hypotheses for each set of insect data should be rejected.  In conclusion, the insect population for the sampled 50 fields in Buck County for some reason or another have elements that cause there to be more Asian-Long Horned Beatles               (z= 2.47)and Emerald Ash Borer Beetles (z= 7.08) , and less Golden Nematodes ( z = -7.76) , than there are in the predictive model for the county.  

3. Comparing the size of all parties that attended a park in the year 1960, and sample group of 25 parties in 1985, we are trying to see if there is no difference in overall group sizes between the time periods (null hypotheses) or to see if there is a difference (alternative hypotheses).  To test the null hypotheses, we compared the t scores of the sample data to the critical values associated with a one tailed test with 95% confidence level.  T scores were used for this data set because the number of observations is below 30.  The t-score of the sample data was 4.92 and the critical value derived from the 95% confidence level was 1.711.  as a result of the t score being higher then the critical value, we reject the null hypotheses.  In conclusion, these results indicate  that if you were to randomly sample an observation from both time periods, the party from 1985 would have a higher chance of having more group members.  

Part 2: Introduction

                For part two of this assignment, students are too chose three variables and compare the prevalence of said variables between southern counties and northern counties within the state of Wisconsin.  The three variables I chose to investigate (all per county) were ATV trail mileage, number of non-residential gun-deer permits, and number of non-residential 15 day fish license.  I chose these variables because Northern Wisconsin often is associated with rustic wildlife and outdoor fun, and we want to see if the patterns in human behavior and attributes of the land coincide with this idea.   The null hypotheses for this situation would be that there is no real difference in these variables between the counties in the north of the state and the counties in the south of the state.  Conversely, the alternative hypotheses is that there is a difference across the geographic space of north and south for the variables selected.  To test the null hypotheses, the data provided will be mapped to show the spatial distribution of the measured variables.  Subsequently, Chi squared tests will be used to either reject or fail reject the null hypotheses for each variable. 
Figure 2: A visual reference for how the northern counties and southern counties
relate to Highway 29


Methods

                The first step in preparing the data for further analysis is to create all the necessary layers in Arc-map.  To begin, we must join the data provided in SCORPARCGIS table provided in an excel spreadsheet, to a shape file of Wisconsin counties.  The join conducted was based off of the county field from both the counties shape file and the SCORP table,  the cardinality of this join was one-to-one and matched for all 72 counties.  The next step was to add 4 fields to the joined tabled; the first added field will delineate weather a county is north of Highway 29 (1 value) or south of Highway 29 (2 value).  The result was an even split of 36 counties in both the north and south portions. The other three fields added were used to classify my selected variables on a scale of 1,2,3,4. The higher the ranking the more of that variable is in that county.   The 1,2,3,4 ranking system was based on what category a county fell into when symbolized into a cloropleth map that was based on a natural breaks, four class classification.  Once all the values were added for all the new fields for all the counties, the next step is to create the maps and cross tab reports necessary to make conclusions about the null hypotheses.
Results
The Three maps created respectively represent the distribution of ATV trails (miles), non-resident gun-deer licenses, and non-resident 15 day fishing licenses  through the counties of the entire state of Wisconsin.  The colored categories  that you can see in in the legend, and the numbers that they are associated with are  the bases for the 4 categories that were discussed in the methods section of this report.  One of the more important thing that these maps convey to us is how for each map, the counties that represented the highest categorical value (dark green - category 4) are almost entirely in the Northern Counties for all variables.  however, other than that observation, these maps do not indicate fully if there truly is a remarkable difference between the two parts of the state.   We must must do further analyses with the Chi-Squared tests to see if their is a remarkable difference in the spatial occurrence of these variables across the north and south divide. 



After conducting the Chi-Square operation on three selected elements, the following charts were produced.  To reject the null hypotheses the Pearson Chi-Square value had to fall outside of the 9.49 critical value, which is associated with a 95% confidence level corresponding with 3 degrees of freedom.

Non- resident 15 day fish license Score =  12.0
ATV trail mileage score = 15.0
Non-resident gun-deer license score = 6.6

In regards to these results the null hypotheses is rejected for both ATV trail mileage and non-resident 15 day fish licenses.  conversely, we fail to reject the Null hypotheses for non-resident gun-deer licenses.  in the tables below, the values to pay attention too in the first box in each section are the ones in the first row. The Chi-Squared Value for each factor is already listed above, the third value when subtracted from 1 gives you the relative percentage which indicates your confidence in that there is a difference between the northern and southern counties.  The second box in each section displays the difference between the expected values, based off of random selection, and the observed values for each ranked category in the north and the south.


1.  Non-resident 15 day fish license

Score = .007



NOTE: 1 = NORTHERN COUNTIES  
              2= SOUTHERN COUNTIES 






 2.  ATV trail miles



















3.  Non-resident gun- deer licenses

















Conclusion

If looking for a sense of  "Up-North" is the goal, we may have found it.  I believe this based off of my results because two out of my three factors ended up showing a much stronger prevalence in the north than in the south.  Suspicions were rising when i created the map that there was a spatial difference, and running the Chi-Square test confirmed my findings that there is a difference between what is observed and what is expected.  In other words, there was something in the north that was creating the conditions for a higher occurrence of 15 day fishing licenses and more ATV trails. when comparing the maps, the fact that  there is a higher prevalence of non residential fishing licenses implies that this is where the most attractive fishing is in the state.  Attractive in the sense of the number of lakes, variety and availability of fish, and even overall surrounding.  People from out of are traveling all the way up to the Northern part of the state would imply that there is a more rustic and natural vibe 'Up-North'.  Not as convincing or telling an argument, but the higher amount of ATV trail miles indicates even more so that the Northern Counties in Wisconsin is the best destination for outdoor and recreation in the state.  Finding out that the number of non-residential deer licenses was more evenly spread out than the other factors was not to my surprise.  Unlike lakes and trails, deer and deer populations have a much higher degree of mobility, and thus are more randomly spread out. eliminating the out-liers of the northwestern counties that border Minnesota, you might even see more hunters in the south than in the north.  The chi squared values were very helpful in confirming that suspicion that was visual provided by the map, and cemented my conclusion that the physical landscape conditions are different in the northern part of the state, leading to the rejection of the null hypotheses.

No comments:

Post a Comment