Test+Reliability+Evidence+Matrix

* ** Click on the Edit button in the top right corner to add comments to the matrix. ** Please put your name in parentheses after your comments.
 * Project #4****Appendix B: Test Reliability Evidence Matrix**
 * Be sure to click Save in the top right corner when you are finished adding comments. ** *


 * = **Reliability**
 * in MMY Review** ||= **Value of**
 * Reliability Statistic** ||= **Reviewer’s Evaluation**
 * of Reliability** ||= **Your Evaluation**
 * of Reviewer’s Evaluation**

||= **Additional** R2: Total, math tasks, problems, courses are all reported above .90 (we are not given specific values to each). A modified version of test-retest .94. A Japanese version of test-retest were much lower at .68, .72, .75 for the subscales. (Holly R.) || R1 and R2 did not report their final evaluation of reliability other than the values listed and that there is an issue because they do not have info for the 1993 version. (Holly R.) || R1: Looking at all the reliability coeffiecients of R1, they are all in the very strong range of reliability; therefore my evaluation of R1 for the 1983 version shows strong reliability. R2: Reliabilty coefficients other than the Japanese version are in the very strong range, therefore R2 of the 1983 version shows strong reliability. The 1983 Japanese version is in the somewhat weak to satisfactory range and therefore shows such reliabilitiy. (Holly R.) ||  ||   ||
 * Information?** ||
 * Reviewer 1 and 2 provide reliability coefficients for 1983 version, not 1993. (Miranda E.) || Reviewer 1: Total scale .96 and .92, .96 and .92 for math tasks. Coefficient alpha reported as .92 and test-retest reliability .94.Reviewer 2: Estimates were above .90 (Miranda E.) || Values listed by each reviewer, but no conclusions provided. (Miranda E.) || Hard to draw conclusions about reliability data without further explanation of interpretation of resutls. (Miranda E.) ||  ||
 * R1 and R2 provided reliability coefficients, but only for the 1983 version. (Holly R.) || R1: Total scale value of .96; Math tasks .92; Problems .96; Courses .92. Coefficient alpha .92. Test-retest .94.
 * R1 and R2 both provided reliability coefficients for the 1983 version only. (Rebecca S.)

The reviewers as stated in the previous comments provided reliability coefficients(Kristina Artino).

As stated by previous comments, the reviewers provided reliability coefficients (Jenna James). || R2: Reported estimates consist of internal consistency and test-retest coefficients. The internal consistency estimates were all above .90 for the original 3 scales and the total scales. A version that was modified of the Math-Related Subjects scale (2-week test-rest) had a coefficient of .94 and a Japanese version had (4-week test-retest) coefficients of .68, .72, and .75 for the math tasks, math problems and math-related school subjects scales. (Rebecca S.)

R1: The total scale is .96 and point .92,.96, and .92 for the math tasks. There also was a re-test reliability of a two week interval where .94 was reported (Kristina Artino). R2: all internal consistency estimates were all above .90 for the original three scales. There was also a modified version of the Math-Related School Subjects scale which had a 2 week test-retest coefficient of .94. There was also a Japanese version which had a 4 week interval and .68(Kristina Artino)

R1: The total scale value was .96, the Math Tasks was .92, the Problems was .96, the Courses was .92, Coefficient Alpha was .92 and Test-Retest was .94 (Jenna James). R2: The total of the Math Tasks, Problems and Courses were all reported the same as above. However, the value of the Test-Retest was .94 and a Japanese version of the value of the Test-Retest was .68, which was much lower (Jenna James). || R1 and R2 did not report a final evaluation but reported reliability coefficients are reported in the manual but are all based on the 1983 version. (Rebecca S.)

R1: Reviewer reported that all reliability estimates were based on the 1983 version and no reliability coefficients are reported for the 1993 version. R2: Reviewer reported that there was a significant omission of information regarding the standard error measurement, which would be essential for interpreting individual scores (Kristina Artino).

As stated by previous comments, the reviewers did not report their final evaluation of reliability. However, the reviewers did mention that the reliability estimates were based on a version from 1983, instead of from 1993 (Jenna James). || <span style="color: #f217dc; font-family: Arial,Helvetica,sans-serif;"> R1: In looking at the internal consistency reliability coefficients provided (1983 version only), my evaluation would shows a strong degree of reliability. This is determined based on reliability statistic interpretations of the value of r being .96 which falls between .90 and .99. <span style="color: #f217dc; font-family: Arial,Helvetica,sans-serif;">R2: My evaluation of the reliability coefficients shows a very strong degree of reliability on the 1983 version and the modified version. This data is based on interpretations of the value of r being above .90 on the 1983 version and .94 on the modified version. However, interpreting coefficient results from the Japanese version shows a satisfactory degree of reliability. (Rebecca S.)

<span style="color: #f217dc; font-family: Arial,Helvetica,sans-serif;">R1 and R2 versions both show strong reliability with ranges for both between .90-.99. However, the Japanese version was significantly lower showing .68. which is much less than the other two 1983 versions, but considered satisfactory (Kristina Artino).

<span style="color: #7030a0; font-family: Arial,sans-serif; font-size: 13.33px;">It is hard to conclude reliability data without more information about the interpretation of the results from the reviewers (Jenna James). ||  ||
 * || R1 includes the authors' report of .96 for the Total Scale, .92 for the Math Tasks and Courses subscales, and .96 for the since discarded Problems subscale; these very strong reliability estimates are provided only for the 1983 version. R2 echoes that the total scale and original three scales were above .90 and includes information on a modified version with a 2-week interval (.94) and a Japanese version with a 4-week interval in which the coefficients are only satisfactory. (Jen M) || The reviewers omit a final evaluation of reliability. (Jen M) ||  ||   ||
 * . As stated above both reviewers gave reliability test statistics (Chelcee S.) || R1. states the reliability coefficients for the three subsets of the MSES .92, .96, and .92. Also, included is the coefficient alpha of .92 and the test-retest reliability of .94. It is then described that the reliability coefficients are not reported for the revised 1993 version of the MSES.

R2. States that there are reliability coefficients that were reported but does not specifically state what they are. He does however, report about the Japanese version of the test and their trial periods with their test-retest coefficients of .68, .72, and .75. Also, it is stated that there is a large omission of information to make valid interpretations of the scores from the MSES. (Chelcee S) || Both reviewers only describe the coefficients that were gathered but did not give a final report on the specific reliability of the MSES in general. (Chelcee S.) || R1. made clear reviews of the data described in the report with specific test coefficients to back up what was stated. R2 though the data was summarized and reported did not provide thorough evidence in comparison. (Chelcee S.) ||  || When we compare the reliability critique from the two different reviews, we need to look at each reviewers specific background. Cichalski is a professor of Counselor and Adult Education. By contrast, Everett Smith is an Assistant professor of Educational Psychology. Based on this background information, one would expect Smith to have a more theoretical critique than that of Cichalski. Upon comparing the critiques, we definitely see that Smith is citing more correlations as regards the validity evidence, which he breaks down into content, construct, and concurrent validity.(Steven H.)
 * __Extended Comments__:**
 * __Extended Comments__:**
 * __Questions for Dr. KB__:**

__**Feedback from Dr. KB:**__ Miranda - I agree that the MMY reviews would have been stronger if the reviewers had provided an interpretation of the alpha values, but some guidelines for interpreting reliability are presented in PPT1.4. For Project #2 I will expect everyone to apply these guidelines to interpret reliability statistics reported in MMY or other publications.