DonNTU   Master's portal

Abstract

Information technologies of knowledge evaluation in electronic test systems

Contents

Introduction

Testing systems entered lives of many countries long time ago. Every year they spread throughout education systems of the world because of convenient results processing and possibility to assess large amount of knowledge in short terms. Computers make testing even more fast and comfortable. But electronic testing systems inherited from classic testing systems one of their weaknesses — in many cases tests do not allow to adequately evaluate learner’s knowledge. This weakness can be fixed in many ways. One of them is additional answers processing. When full answers can be evaluated using some maximal grades, partial answers can be treated in many ways. Also it is possible to implement various additional elements affecting the result, such as self-assessment. The second way is to change very structure of classic testing system, which means to change connections between tasks, answers and assigned grades. This can be achieved by using concept mas and adaptive testing based on them.

Aim. It is needed to maximize quality of knowledge assessment, without overloading test tasks set with specifying and controlling elements. To achieve that one must choose, research and possibly improve one of the existing methods of testing. This means such tasks have to be achieved:

Relevance of research
Nowadays electronic testing systems are widely used in one or another form to make decisions crucial for people’s professional lives. Test results are used to hire and graduate people. Moreover they are used on government level like independent external testing, which is computer-processed.

Planned practical results
Research’s results can be used to improve existing test systems by implementing modules of additional answer analysis or to create new test systems with architecture, which would differ from classic one.

Current state of problem
Avanesow V. in [1] proposes a grading method which implies using special sets of answer items. Assigned grades for every question can be 1, 0 and -1 depending on chosen item. Pupil can end the test with negative amount of points, if he chooses answers, which are logically opposite to right or considered more wrong by teachers. This grading can only be applied to answers which have strong logical connections and respect the principle of implication [2]. This approach, according to his research, allows finding critical logical mistakes in learner’s knowledge.

His research shows, that great deal of tasks efficiency depends on distracters. Those are incorrect answer items, which distract the tested person. Good choice of distracters allows guessing probability to stay at 1/K level, where K is number of answer items. Which means good distracters must not be obviously wrong. If system has tasks with answers, which are never chosen, it can not provide adequate grades because all calculations of guessing probabilities become wrong. Deleting a bad distractor from answers list and recalculating amount of points in terms of many evaluation models means average result becoming lower.

It is important to mention contribution of Karpova I. in development of models of partial answers evaluation. She proposed Delta-method (D-method) of evaluation which does not require deep expert knowledge. Basis of method is function of sets similarity, which is inverted function of distance between sets of answers and right answers:

1,                                    (1)

where LE — size of etalon set, КA — quantity of answer elements, included in etalon set, К' —quantity of answer elements not included. This value changes in borders [0, 1] and becomes smaller in both cases of extra and absent items. When order of items in answer is important it is represented as list and procedure of establishing lists similarity [3] is used. To compare two lists bubble sorting [4] is used. Maximum quantity of rearrangements Кn for list with length n is:

2,                                 (2)

This defines list similarity as:

3,                                 (3)

where Ki — quantity of rearrangements in answer list. List comparison is performed in two stages. On the first stage lists are compared as sets. Two evaluation results from these stages are then mixed using one of the available mathematical operations, to get a final grade (pic. 1).

D-method, 52 frames, 256x128

Picture 1 (animation, 52 frames, 10 loops) Words comparsion with D-method

It is believed that model of D-method can be used for fill-in questions [5] and tables [6].
Quality of assessment can be improved not only by various methods of answers evaluation, but also by improving tasks sets. Anohina A. in [7] reviews many cognitive aspects of learning and proposes using concept maps to improve the structure of tested knowledge field. Concept maps are represented by grapes which have concepts in vertexes and connections between them on edges. Also linking words/phrases as "includes" or "produces" are used. Concept maps can have different topologies [8]. Visually, most non-specific concepts are usually placed on top of the map [9]. System which uses concept maps can be described by defining 3 of her parts. Those are tasks, which allow learner to prove he understands the concept; methods of solving whose tasks; method of learner’s concept map evaluation [10]. Learning can be high-directed or low-directed. This influences predefined structure of map and concepts/connections. It is believed that in process of concept map construction much more complex cognitive processes according to Blooms taxonomy [11] occur, than in process of classic testing [12].

System can also use different types of connections, like “more important” and “less important”. Adaptive testing systems can be based on concept maps [13]. The determine how hard it is for learner to create a concept map and help him.

There is one more way of quality improvement. It is described in paper [14] by Darvin P. Hunt. In his paper he discusses the meaning of personal knowledge and comes to definition of knowledge as a special type of belief. That’s why knowledge can not be separated from confidence level. Errors in test answers play a great role in learner professional development. Specialist would not base his actions on knowledge with low level of confidence (forgotten, unknown, partial), but he can make confident mistakes if he bases his actions on false knowledge with high level of certainty. They must be pointed out early in process of learning. Self-assessment is a criterion, used in special test systems. Each test task gets additional question about learner’s level of certainty. Research shows, that students which receive tests with self-assessment spend much more time and effort on learning activities to demonstrate hog level of certainty. This happens because of motivational elements of the system [15]. Self-assessment results can change student’s grade in 3-5% interval thus providing motivation. Confident mistakes would lower the grade, when confident right answers would make it higher. Self-assessment not only motivates but also helps to see, whether material was retained. Also it helps to find poorly formulated tasks which show overall level of certainty. Research is Sweden in 2001 shows that self-assessment tests allow to change the difference in average grades of learners of different genders [16].

Conclusion. Methods of knowledge assessment quality improvement were analyzed. Results show that many models allow evaluating partial answers without experts help. There are some ways of quality improvement related to rapid usage of expert knowledge in process of learning and changing the architecture of test system. Also there are some efficient assessment methods which rely on additional specifying questions. Their further research will show their reliability.

 

Currently abstract is incolplete. Full version will be available in February 2014 through author or his science advisor.

Bibliography

  1. Аванесов В.С. Научные проблемы тестового контроля знаний / В.С. Аванесов  — М.: Исследовательский центр проблем качества подготовки специалистов, 1994. — 135 с.
  2. Аванесов В.С. Композиция тестовых заданий. / В.С.  Аванесов, 3 изд. М.: Центр тестирования, 2002. — 240с.
  3. Фор А.  Восприятие  и  распознавание  образов /  Пер.  с  фр. /  Под  ред. Г.П. Катыса. — М.: Машиностроение, 1989. — 272 с.
  4. Кнут Д. Искусство программирования для ЭВМ / т.3. Сортировка и поиск / Пер. с англ. / Под ред. Баяковского и Штаркмана. — М.: Мир, 1978. — 848 с.
  5. Шемакин Ю.И.  Начала  компьютерной  лингвистики:  учеб.  пособие / Ю.И. Шемакин—  М.: Изд-во МГОУ, А/О "Росвузнаука", 1992. — 115 с.
  6. Карпова И.П. Анализ ответов обучаемого в автоматизированных обучающих системах / И.П. Карпова // Информационные технологии, 2001, № 11. — с.49-55.
  7. Anohina A. Using concept maps in adaptive knowledge assessment / A. Anohina, V. Graudina, J. Grundspenkis // Advances in Information Systems Development,  2006. — p. 469
  8. Yin Y. Comparison of two concept-mapping techniques: implications for scoring, interpretation, and use. / Y. Yin, J. Vanides, M.A. Ruiz-Primo, C.C. Ayala, R.J. Shavelson —  J. Res. Sci. Teaching, vol. 42, no. 2 , 2005. —  p.166-184
  9. Novak J.D. The theory underlying concept maps and how to construct them. / J.D. Novak, A.J. Canas  — Technical Report IHCM CmapTools 2006-1.
  10. Problems and issues in the use of concept maps in science assessment / M.A. Ruiz-Primo, R.J. Shavelson — J. Res. Sci. Teaching 33 (6), 1996. — p.   569-600
  11. Bloom B.S. Taxonomy of educational objectives. Handbook I: The cognitive domain / B.S.  Bloom — David McKay Co Inc., New York — 1956.
  12. Mogey N. The use of computers in the assessment of student learning / N. Mogey, H. Watt // G. Stoner (ed.) Implementing Learning Technology. Learning Technology Dissemination Initiative, 1996. — p.50-57
  13. Papanastasiou E. Computer-adaptive testing in science education/ E. Papanastasiou  // Proc. of the 6th Int. Conf. on Computer Based Learning in Science, 2003. —   p. 965-971
  14. Hunt D.P.  The concept of knowledge and how to measure it / D.P. Hunt  // Journal of Intellectual Capital, vol. 4, p 110-113
  15. Franken R.E. Human Motivation / R.E. Franken, 3rd ed., Brooks Cole, Pacific Grove, CA,  1994.
  16. Koivula N. Performance on the Swedish Scholastic Aptitude Test: effects of self-assessment and gender/ N. Koivula, P. Hassmen, D.P. Hunt // Sex Roles, Vol. 44 No. 11/12,  2001. —  p. 629-645