Test+Validity+Evidence+Matrix

* ** Click on the Edit button in the top right corner to add comments to the matrix. *** * **Please put your name in parentheses after your comments. **Be sure to click Save in the top right corner when you are finished adding comments. ** *
 * Appendix A: Test Validity Evidence Matrix**


 * = **Validity Evidence** __//**Standards**//__ ||= **Validity Evidence**
 * MMY Review** ||= **Reviewer’s Evaluation**
 * of Validity** ||= **Your Evaluation**
 * of Reviewer’s Evaluation**

||= **Additional**
 * Information?** ||
 * < ** Content ** || Mentioned by both reviewer 1 and 2. (Miranda E.) || Reviewer 1: States content validity is based on 1983 version rather than 1993, has no validity coefficients to report; Reviewer 2: Content validity is based on existing measures of anxiety and confidence toward math (Miranda E.)

Reviewer 1: Ciechalski, J.C., states that there is evidence of the content, but as Miranda stated, it is based on the 1983 version of the MSES and there are none that are reported on the 1993 version of MSES. Reviewer 2: Smith, E.V., reported that evidence is based on a review of existing measures of mathematics anxiety and confidence (Kristina Artino).

Langenfeld and Pajares stated that their study supported the idea that MSES is a reliable and valid scale used to measure mathematics self-efficacy. However, the results indicated that the MSES and its three scales could be revised to strengthen the test overall (Jenna James).

R1 states that evidence of content validity is reported, but only for the 1983, not the 1993 version. R2 goes more in-depth in explaining how the content was based on review of existing measures. (Jen M) || Reviewer 1: Not thorough but easier to locate information with capitalized headings Reviewer 2: Difficult to follow without headings but more information (Miranda E.)

Reviewer: I think the reviewer was clear in describing the administration of the MSES, and the scoring is easy. I also think the reviewer clearly stated that all of the information contained in the manual was from 1983 and has not been up-dated since. Reviewer 2: I think this reviewer had good information regarding the MSES, however; I did not see where he stated that most of the validity of the reviews were from 1983. However, he did use references at the end where reviewer 1 did not (Kristina Artino).

I believe that Langenfeld and Pajares showed evidence for stating that the MSES could be revises to strengthen the test overall. For example, the reviewers believed that because the MSES and its three scales have not undergone validation analyses beyond the usual reporting of Cronbach's Alpha, that the MSES could be revised to strengthen the test overall (Jenna James).

While R1 was easier to follow and concise and did include information about the differences between the 1983 and 1993 version, I found the information provided by R2 to be more relevant to someone considering whether the MSES would be a suitable instrument for their purposes. Including more information on what evidence of content validity is offered could help a prospective user feel more confident about their choice. (Jen M) || Concern about content validity being from 1983 rather than 1993 (Miranda E.) Concerning this review of the MSES, I would agree with the reviewers that the scoring is easy and it does not take long to administer. However, it is rather alarming that it has not been reviewed since 1983 and this was done in 1993. Lots can change in that time frame (Kristina Artino).

Reviewer 2 recommends that future versions of the MSES include more representative norming sample, additional psychometric studies investigating the dimensionality of the responses, and attention to the other recommendations mentioned in his review. (Jenna Reed)

Jenna Reed brings up a fantastic issue regarding the dimensionality of a test. In general, when creating a test, one seeks to have a feature called "unidimensionality". That is, the test one creates should reflect one and only one objective. There have been tremedous strides in the field of Item Response Theory(IRT) that address the unidimensionality of the an instrument. For example, a test purported to measure anxiety should just reflect anxiety--not anxiety and field dependence or anxiety and attitude. The lack of unidimensionality seriously attenuates the findings of any test battery and it the investigators responsibility to interpret correctly what the scores mean.(Steven H)

A great point was given by Jenna J. if the findings for validity was strengthened the MSES could be revised and used for sufficient testing in the future (Chelcee S.) || This type of evidence does not apply because it was not considered until the 1999 Standards were published. Reviews before 1999 will not have used this language. (Dr. KB) ||<  ||<   ||<   || Reviewer 2: Goes into great detail about validity and details structure of MSES using group techniques (Caitlyn T.) || Reviewer 1: Not thorough on content for 1993, but headings made it reader friendly. Reviewer 2: More information was given, however, not reader friendly/disorganized. (Caitlyn T.)
 * < ** ReResponse Process ** ||< **DOES NOT APPLY**
 * < ** Internal Structure ** || Validity in general was noted by both reviewer 1 and 2. (Caitlyn T.) || Reviewer 1: Limited information was noted. Gave a basic statement of validity but lacking info on internal structure.

I agree with Caitlyn T. that the information provided by Reviewer 2 was more thorough and provided more evidence when it came to the validity but I had a difficult time following the content. (Haley B.) || It was evident that both researchers conducted a thorough review of the self-efficacy scale. It was clear that R.1 was succinct and stuck to the facts of the research in comparing the 1983 model to the 1993 update. R.2 though gave many descriptors of the findings, was very difficult to pick apart the actual research and important information dealing with the evaluation of the MSES (Chelcee S.) ||
 * < Relationship to other variables || R1 and R2 report concurrent evidence is reported in the 1983 version. (Holly R.)

R1 and R2 reported evidence of concurrent coefficients for the 1983 version but no validity coefficients are reported for 1993 version. (Rebecca S.) || R1 did not give a further evaluation of validity evidence other than reporting concurrent evidence is reported. R2 reported that concurrent evidence is supported by using correlations with other, similar scales of attitudes toward math. R2 reported r values as .56, .66, .47, .47. Which are mostly unnacceptable on the reliability statistic scale. (Holly R.)

R1: Other than reporting that concurrent validity is reported, no further evaluation of concurrent validity was provided. R2: Reported evidence for concurrent validity is supported using correlations with other measures of attitude toward math. The total score was found to be correlated with math anxiety r = .56, confidence in doing math r = .66, perceived usefulness of math r = .47, and effectance motivation in math is r = .46. These scores are unacceptable (data interpreted using the degree of reliability scale). (Rebecca S.)

I concur with Rebecca S. on this issue. More needed to be said about the discrepancies between these correlations which indicate validity. Which validity coefficients were used? How were all these scores calculated? (Steven H.) || R1 did not give enough evidence to support validity. R2 gave a more detailed report overall of validity evidence, and for this 1983 version does not seem to show high validity. (Holly R.)

R1: Did not give evidence to support the coefficients reported on the 1983 version. Although, I felt their evaluation was more organized.

R2: Their evaluation was detailed but information was unorganized. They did provide detailed information to support evidence of validity. Although given all the details in the information my evaluation would determine the validity to be unacceptable. (Rebecca S.) ||<  ||
 * < ** Consequential ** || **DOES NOT APPLY**

This type of evidence does not apply because it was not considered until the 1999 Standards were published. Reviews before 1999 will not have used this language. (Dr. KB). ||  ||   ||<   ||

__**Questions FROM Dr. KB**__


 * __Extended Comments:__**

Hi Dr. KB, I have a question about the value reliability statistic. Is the r value used in both reliability and validity evidence, and the 0.00 to 1.00 range chart is used for both? Not just reliability? Thanks! Holly
 * __Questions for Dr. KB__:**

From Dr. KB This is a great question Holly. There are a few different types of r statistics. Fortunately, nearly all of the r statistics are interpreted in the same manner. For reliability estimates of internal consistency, the r statistic is interpreted exactly as you described. Likewise, for evidence of internal structure validity, the interpretation is the same. For evidence of relationship to other variables validity, often an r statistics is reported. This r statistic is not a measure of internal consistency but instead is a correlation statistics that demonstrates how strongly two sets of data (scores on the instrument and score on the other variables). The good news is that r as a correlation statistic is interpreted in the same manner with larger values demonstrating that the correlation is stronger.


 * __Feedback from Dr. KB__:**

Miranda and Caitlyn - You've pointed out some important information in the reviews about evidence of content validity provided for the 1983 version but not for the 1993 version. When we get to the step in Project #2 where we look for addition information about validity and reliability, it will be interesting to see if there are other publications that provide validity information on the 1993 version.