EDUCATIONAL EVALUATION : July 2022

Evolution of Evaluation

Evolution of Evaluation

“Evaluation is a very young discipline - although it is a very old practice.” - (Scriven, 1996) In this chapter, an overview of the global development evaluation scenario is presented. To understand the current scenario, an idea of the historical development is necessary. Hence, it is presented next.

Global Picture

This section describes how evaluation evolved as a field and important organisations and journals in the field of evaluation at the international level. Most of the published literature in this field comes from United States of America and some from Europe, thus there is a clear western bias in the documentation of history and important organisations and journals.

History of evaluation

History of evaluation is as old as human activity, Humans (a) identify a problem/ issue, (b) devise alternatives to tackle it, (c) evaluate the alternatives, and then (d) adopt those that results suggest will reduce the problem satisfactorily (Shadish & Luellen, 2005). Shadish and Luellen give examples of earliest documented evaluations from personnel evaluation in China over 4000 years ago and evaluation of Hebrew diet in Bible. Program evaluation is divided into 7 development periods in the western world. First, the period prior to 1900, the Age of Reform; second, 1900-1930, the Age of Efficiency; third, 1930-1945, called the Tylerian Age; fourth, 1946 -1957, called the Age of Innocence; fifth, 1958-1972, the Age of Development; sixth, 1973-1983, the Age of Professionalization; and seventh, from 1983-2000 the Age of Expansion and Integration (Hogan, 2007). In Age of Reforms, earliest documented evaluations are educational and production processes. In Age of Efficiency, scientific management based on observation, measurement,analysis, and efficiency became prominent, objective-based tests were used to determine quality of educational instruction. In the Tylerian Age, criterion-referenced testing based on internal comparison of objectives and outcomes was started. World War II was followed by a period of great growth when accountability of national expenditure was ignored, thus this period is labelled as Age of Innocence. Till this period, most literature on evaluation is on educational evaluation. In USA, with the Elementary & Secondary Education Act introducing supplementary programs to support education of disadvantaged students, program evaluation as we know started in the Age of Development. In the Age of Professionalisation, many journals and university courses on evaluation were started and evaluation established as a formal independent professional field. With increase in aid funding, in the Age of Expansion and Integration, professional associations and evaluation standards were established (Hogan, 2007). In the new millennium, the focus is on capacity development and building institutions for evaluation where organisations like United Nations Evaluation Group and World Bank play a major role. Instead of multiple agencies following multiple standards, there is a move towards a consultative standardisation. This, I am terming as the Age of Consolidation (2000-current). In past few decades, following trends emerged in program evaluation (Hogan, 2007):

· Increased priority and legitimacy of internal evaluation.

· Expanded use of qualitative methods and a shift toward mixed quantitative-qualitative methods instead of depending exclusively on either.

· Increased acceptance of and preference for multiple-method evaluations.

· Introduction and development of theory-based evaluation.

· Increased concern over ethical issues in conducting program evaluations and increased use of evaluation to empower program stakeholders.

· Increased use of program evaluation within business, industry, foundations, and other agencies in the private and non-profit sector.

· Increased options that evaluators should also be advocates for the programs they evaluate.

· Advances in technology, communication, and ethical issues.

· Modifications in evaluation strategies to accommodate increasing trends of government decentralization and delegation of responsibilities to state/provinces and localities

International organisations in evaluation

In the field of development program evaluation, few organisations are widely recognised. These organisations, by their affiliation, are the leaders in the industry. These are:

· United Nations Evaluation Group, a platform for United Nations Evaluation Offices across units

· Independent Evaluations Group of World Bank

· International Organisation for Cooperation in Evaluation

· International Development Evaluation Association

· American Evaluation Association

· European Evaluation Society

The first two make the evaluating agencies for a large amount of development aid, while IOCE and IDEAS bring together different evaluation organisations. The last two are academic bodies which bring together the leading evaluation practitioners and theorists in the world. United Nations Evaluation Group was first established in January 1984 as the ‘Inter-Agency Working Group on Evaluation’ (IAWG), a part of UN consultative group on programme questions (CCPOQ). This is a group of heads of UN evaluation offices to discuss system wide evaluation issues. UNEG’s initial work was on designing, testing, and introducing monitoring and evaluation system for UN operations across specialised agencies, funds, programmes, and affiliated organisations. UN Development Programme (UNDP), which funded most UN operations, provided the secretariat and leadership for the Group. It was renamed to UNEG in 2003 (UNEG Secretariat, 2008). UN also has an Office of Internal Oversight Services, established in 1994 by the General Assembly. The office assists the Secretary-General in his oversight responsibilities in respect of the resources and staff of the organization through the audit, investigation, inspection, and evaluation (OIOS, 2018). The Independent Evaluation Group (IEG) is independent of the Management of World Bank Group and reports directly to the Executive Board (IEG, 2018). It is charged with objectively evaluating the activities of International Bank for Reconstruction and Development (IBRD), International Development Association (IDA; together, the World Bank), work of International Finance Corporation (IFC), and Multilateral Investment Guarantee Agency’s (MIGA) guarantee projects and services to provide accountability, course corrections, and avoid repetition of past mistakes in meeting the agenda of making the world poverty free.

World Bank project evaluations began in 1970 through Operations Evaluation Unit in Programming and Budgeting Department. In 1973, it was renamed the Operations Evaluation Department, and became independent from bank management. IFC established an evaluation unit in 1984, and in 1995 the unit increased its independence and was renamed as Operations Evaluation Group. MIGA created an evaluation office in 2002. In 2006 the Board of the Bank Group integrated these into a single unit, Independent Evaluation Group (Wikipedia, 2017). The International Organisation for Cooperation in Evaluation is a UNEG supported moment that represents international, national, sub-national, and regional Voluntary Organizations for Professional Evaluation (VOPEs). It strengthens international evaluation through the exchange of evaluation methods and promotes good governance and recognition of the value evaluation has in improving peoples’ lives (IOCE, 2018). The EvalPartners group, managed by UNICEF and IOCE, is supported by various partners, including DevInfo, IDEAS, UN Women, UNEG, UNDP, ILO, IDRC, Rockfeller Foundation, Better Evaluation, ReLAC, Preval, Agencia Brasileira de Avaliacao, SLEvA and IPEN, all working together for SDG evaluation (Eval Partners, 2017). International Development Evaluation Association was established in 2002 as a global professional association for active development evaluators. It aims to improve and extend the practice of development evaluation by refining knowledge, strengthening capacity, and expanding networks, especially in developing countries (IDEAS, 2018). American Evaluation Association (1986) and European Evaluation Association (1992) were established, to promote evaluation use and enrich its theory and practice in the two continents.

Global Evaluation Agenda (GEA) 2016-2020

To support monitoring and evaluation for achieving the 2030 Agenda for Sustainable Development, United Nations adopted the resolution 69/237 on 19^th December 2014 for “building capacity for the evaluation of development activities at the country level”. This was a step towards building global cooperation for evaluation, year 2015 being already declared as the International Year of Evaluation (EvalYear) at the 3^rd International Conference on National Evaluation Capacities at São Paulo, Brazil, in September 2013. The idea behind this was to 23 advocate and promote evaluation and evidence-based policy making at international, regional, national, and local levels (EvalPartners, 2016). The Global Evaluation Agenda (GEA) 2016-2020 is the first ever long-term global vision for evaluation. The GEA was developed by many global collaboration, under the EvalPartners umbrella. The discussions around evaluation capacities and capabilities intensified during the Year of Evaluation in 2015, celebrated at 92-plus events around the world. The Year of Evaluation culminated in a historic global gathering hosted by the Parliament of Nepal in Kathmandu where the GEA was launched and endorsed by various stakeholders including Governments, Parliaments, civil society, and academia, in an atmosphere of global solidarity and partnership (EvalPartners, 2016). EvalAgenda2020 envisions to strengthen the four essential dimensions of the evaluation system, enabling environment for evaluation, institutional capacities, individual capacities for evaluation, and inter-linkages among these first three dimensions (EvalPartners, 2016).

Development Evaluation in Independent India

System of evaluation was conceived in India simultaneously with planned economy. With the launch of first five-year plan in 1951, a need for systemic evaluation was felt, and the first plan deemed that systematic evaluation should become a normal administrative practice in all spheres of public activity and for this the Planning Commission (PC) began developing the evaluation techniques by establishing Program Evaluation Organisation (PEO) for independent evaluations of community projects and other intensive area development programmes (Chandrasekar, 2015). From there, India has come a long way over the past 67 years. Dr S. Chandrasekar served as the Director of Regional Evaluation Office, at Chennai and then as Adviser at Directorate of Economics and Statistics, Ministry of Agriculture, New Delhi. He wrote an article about history of Development Evaluation in India, published as a web special by Yojana in November 2015, around the time when a lot of changes were happening in the Indian evaluation scenario. Most of this section is based on his article and a report by World Bank on M&E system in India (Chandrasekar, 2015) and (Mehrotra, 2013).

Evolution of evaluation institutions in India

The history of institutionalised development program evaluation can be divided into following phases, based on how the Government of India treated its evaluation organisations:

1. Planned economy phase 1952- 1973

2. Neglect phase 1973-1995

3. Resurgence phase 1995-2013

4. New institutions and paradigm phase 2013-current

Planned economy phase 1952-1973

The PEO was established in October 1952 as an independent organisation under the Planning Commission to evaluate development programs implemented in the first five-year plan and bring out their successes and failures through reports. Over the first four five-year plans, PEO activities expanded considerably and most states established their evaluation units in the sixties, for state level programs for cross-verification and learning in tandem with PEO. The scope of PEO extended to include plan schemes/ programmes in sectors of health,agriculture and cooperation, rural industries, fisheries, family welfare, rural development, rural electrification, public distribution system, tribal development, social forestry etc. Later, PEO also evaluated Centrally Sponsored Schemes (CSS) (Chandrasekar, 2015). PEO, a field-based organisation, had three-tiered structure – Headquarters in New Delhi at higher level, 3 Regional Evaluation Offices at middle level and 20 Project Evaluation Offices at lowest level. Beyond these were the state offices, taking the total offices to 40 and staff strength to over 500. PEO had relative autonomy as all its offices and the state evaluation offices reported to the Director, PEO. The evaluation reports were a major part of annual conference of State Development Commissioners, enabling follow up actions (Mehrotra, 2013).

Neglect phase 1973-1995

With the reduction in scope of planning commission activities in early seventies on the recommendations of the Administrative Reforms Commission, PEO started its phase of decline and neglect. While the extent of its work was expanded to include urban areas too, its scope of evaluations was reduced to operational, financial, and administrative aspects of schemes and programs, rather than the overall design of programs and their impacts. It was recommended that only those studies should be taken up which could be made available quickly for use by line divisions. This was accompanied by appointment of Indian Economic Service Officers, who are generalists compared to earlier subject specialist academicians, as the head of PEO. Internal PEO functions were merged with Planning Commission in April 1973, reducing it to a division within a department (Chandrasekar, 2015). Around the same time, based on recommendations of Staff Inspection Unit of Ministry of Finance, field offices were reduced from 40 to 27 by the end of the seventies (Mehrotra, 2013). PEO featured briefly in latter plans and received insufficient financial layouts, limiting its ability to bring out good reports on time. Its reports were delayed, didn’t cover program impact & design anymore, and were given less important by the concerned ministry thus, the reducing their use. This in turn reduced the number of studies being done (Chandrasekar, 2015).

Resurgence phase (1995-2013)

The resurgence in demand for evaluation can be traced to the late nineties, when the Planning Commission got involved in design and implementation of social safety net programs to counter the adverse effects of economic reforms initiated earlier. Unfortunately, the Fiscal Responsibility and Budget Management Act 2003 ensured that the PEO and its field offices were highly understaffed. This began the practise of outsourcing the studies to social science research institutes. The PEO involved the ministries and subject matter expert groups in ensuring some actions were taken based on its reports from the ninth plan onwards (1997-2002) The eleventh five-year plan 2007-2012, stressed on building online MIS for all flagship programs. Development monitoring unit was setup in Prime Minister’s Office in 2009, and a Performance Monitoring and Evaluation System (PMES) was created at the cabinet secretariat. The functions of monitoring and evaluation were being mixed together. A scheme named Strengthening Evaluation Capacity was launched in 2006-07, to reduce the financial problems at PEO but it did little to address the administrative and staff problems (Chandrasekar, 2015). During this phase of resurgence in demand for evaluation activities, mixing up of monitoring and evaluation, ignoring plight of PEO, underutilisation of studies, and outsourcing to private institutions without clear policy, were a few grave mistakes made. As a result, in 2012, there were 6 regional and 8 project offices left (PEO, 2012).

New institutions and paradigms phase (2013-current)

A new Independent Evaluation Office was established in the 12^th plan with a mandate to “conduct evaluation of plan programmes, especially the large flagship programmes to assess their effectiveness, relevance and impact. It also has the freedom to conduct independent evaluations on any programme which has access to public funding or implicit or explicit guarantee from the government.” Instead of using regular organised services available to government, it proposes to get evaluation done by selected institutes and researchers identified through tender processes (Chandrasekar, 2015). Not much is known about how IEO was expected to function and how it was different from the PEO. With the change in regime and dissolution of Planning Commission in 2014, PEO and IEO have been merged into Development Monitoring and Evaluation Office (DMEO) in September 2015. In 2017, most field offices were shut down and staff was attached to DMEO at New Delhi (Indian Express, 2017). Even less details are available on official websites about this office compared to PEO (and IEO). The PMES started earlier is now replaced by Pragati dashboard for direct follow-up by PMO for better implementation but this misses any opportunity for evaluations based on the Results Framework documents prepared by the ministries (The Economic Times, 2015).

Concurrent evaluations

In the resurgence phase, concurrent evaluations were being regularly done by ministries themselves for their programs. For example, National Food Security Mission under Department of Agriculture and Cooperation, Ministry of Agriculture was carrying out its own concurrent evaluations in 2010 (NFSM Cell, 2010) and Ministry of Rural Development had a Concurrent Evaluation Office (CEO), set up for managing Concurrent Evaluation Network (CENET) of MoRD, in conjunction with IEO. The CEO was closed in July 2016 (PIB, 2016). Concurrent evaluation is either a formative or process evaluation, which evaluates all the activities carried out to achieve program objectives, annually. Concurrent evaluations have been done in the past too, an example is the concurrent evaluation of Integrated Rural Development Program carried out by Department of Rural Development, Ministry of Agriculture in 36 districts of the country since October 1985 for at least a year. As ordinary evaluations in that era were usually ex post facto, they did not provide remedial measures and mid-term collections, a need for concurrent evaluation was felt. (Saxena, 1987) The term concurrent evaluation isn’t common outside India, where the term self-evaluations is used for internal, regular evaluations (UNEP, 2008).

Current Scenario

Past decade has been very eventful for the evaluation systems in India. IEO was set up and closed, PEO was closed, Results Framework Diagram based PMES was started and closed and DMEO has been started recently. This section captures the current scenario

DMEO at NITI Aayog, New Delhi

While Development Monitoring and Evaluation Office (DMEO) has been established in 2015 and NITI Aayog has a very functional and updated website, very little information is available about it, in the Digital India age. The little information available is from a few newspaper articles and telephone book of NITI Aayog. While the 2016 contacts document mentions 7 regional DME offices and 8 Project DME offices, the 2018 document mentions no regional or project offices (NITI Aayog, 2018). This change is also hinted at in news in 2017 which mentions that the 15 offices are being shut down and staff called to headquarters in Delhi (Indian Express, 2017). In the current set up, DMEO has a Director General at helm, a Joint Secretory, two Deputy DGs, an under Secretory and staff attached to their offices. On the Technical/ specialist end, there are a few senior Research Officers, Sr. Statistical Officers, a Senior Consultant and many Economics Officers, Consultants, Research Associates and Young Professionals, a total of about 25-26 people. There is some administrative staff as well (NITI Aayog, 2018). In 2016, DMEO called for Expression of Interest by Research Institutions, NGOs, and universities for carrying out evaluation studies. While this call for EoI is available online, the final list is not found on the NITI Aayog website. As per mandate of DMEO, it is expected to get evaluation studies done as requested by various ministries for their programs. This is similar to what PEO and IEO were doing.

Evaluation in Indian states

Evaluation was an integral component of every state’s planning and implementation process while PEO was blooming. States have taken varied path in past few decades from there. While Evaluation is reported just as an activity under the Directorate of Economics and Statistics in Planning Department in most states, Karnataka has an Evaluation authority, in Goa and Sikkim, Evaluation is in the name of the directorate. When we look at the official websites, we see that evaluation occupies important position in many states.

It is seen that across the states, evaluation is a function generally under the Planning Department, which has the Directorate of Economics & Statistics, responsible for all statistical data collection, analysis, and in most states, for monitoring and evaluation functions. Most of these functions started during the third plan period (1961-66) (PEO, 2006). Outsourcing of evaluation studies to competent agencies has been going on for a couple of decades and the websites, developed in last 10 years mostly, show records of processes carried out by various states since 2012-13, under the 12^th Five-year plan. Unlike Maharashtra though, very few states refer to the UN guidelines in their empanelment Process. Records of how the feedback generated by these studies is used is poor. Program Evaluation Organisation had brought out one study in 2004 and another in 2006 titled Development Evaluation in PEO and Its Impact (Vol I and Vol II) which summarise the follow up actions taken based on the evaluation studies done in the preceding years (PEO, 2006). Beyond this, not much is documented

construction of attitude scales

An attitude is a dispositional readiness to respond to certain institutions, persons or objects in a consistent manner which has been learnt and has become one’s typical mode of response.”

—Frank Freeman

“An attitude denotes the sum total of man’s inclinations and feelings, prejudice or bias, pre-conceived notions, ideas, fears, threats and other any specific topic.”

—Thurstone

Steps in construction of Likert attitude scale:

1) Discussion: Informally discuss the issues with the people, extension workers, experts, NGOs and also consult secondary sources. For an example if an investigator wants to develop a scale on attitude of schizophrenic patients among schizophrenic patients among caregivers, discuss the topic within caregivers, staff nurses who is giving care to schizophrenic patients, experts in the field such as psychiatrist, psychologist, psychiatric nurse, psychiatric social workers and NGOs.

2) Review: Review related literature to the particular topic of interest. Refer journals, books, articles and net sources. Literature review helps in the process of item generation for the scale11

3) Writing statements: Based on the discussion and extensive review, collect a set of such statements on the issues. Make the items simple and straight forward so that respondents are able to fill out the scale quickly and easily writing positive and negative statements: Write acceptance or rejection statements, it should imply a different degree of favorable or unfavorable attitude towards the issue in which an investigator intended to assess. Statement or item could be positive or negative. Positive statements should be objective statements which are acceptable by those having the attitude, and just as unacceptable by those having the attitude, and just as unacceptable to those not having it. For an example “I frequently use library resources to go beyond the required reading”. Negative statements should be objective statements which are acceptable to those not having the attitude and just as unacceptable to those having it. For an example “Home work assignments are designed to meet course requirements. It is impractical in time and energy to do more than is required” 13

4) Create an item pool: Continue writing items, both positive and negative, until item pool at least twice the size the size of instrument intended14For an example “If an investigator plan to have 20 items in final scale, then create an item pool of 40 items”.

5) Editing of items: After having collected as many relevant statement as possible, the next step is to go through each item carefully15 Criteria for editing: Avoid the statements which refer to past rather than to present for an example “At one time small pox affected large number of people”. Avoid using statements that are factual or capable of being interpreted as factual. For an example “Using power point slides is a modern medium in educational technology”. Avoid irrelevant to be endorsed by almost every one or no one. For an example “Admitted in private hospital proves expensive”. Avoid irrelevant to the object under consideration. For an example “In future surely there will be treatment for AIDS”. Avoid more than one thought and double negative statements. For an example “Most of the people do not think that AIDS does not cure”. Avoid certain word that may not be understood by the respondents. For an example “Depot injection is more advanced form”. Avoid certain such universals such as all, always, none, never, often etc as these introduce ambiguity. Avoid such words as only, just, merely etc. Avoid biased languages. It is important to avoid using emotional words or phrases in items. Avoid double barrel questions, where the item actually combines two different questions into one. For an example “Do you think that the nursing service department is prompt and helpful?” Any items that include the word “and” should be closely examined to see if it is actually a double barrel question. Avoid non monotonic questions, where people could provide the same answer to a question for different reasons. For an example “Only people in the nursing should be allowed to wear white uniform”. Some could disagree with item either because they feel that nurses should be allowed to wear white uniform or because they feel that no one should be allowed to wear white uniform16

6) Rank: After editing, select the items and give rating to the items. Rank orders the items on clarity and potency.Choose an equal number. Five categories are fairly standard. Some scale constructors use seven categories and some prefers four or six response categories with no middle category. All of these seem to work satisfactory17

7) Scoring: The points given for each response depend on whether the statement is positive or negative. The person who strongly agrees with a positive statement gets maximum points. One who strongly disagrees with a positive statement gets the minimum points. For the purpose of scoring, assign the numerical value of 5 to strongly agree, 4 to agree, 3 to undecided, 2 to disagree and 1 to strongly disagree. In case of the item is negative, reverse the order of scoring. 5 to strongly disagree, 4 to disagree, 3 to undecided, 2 to agree and 1 to strongly agree

8) Write instructions which clearly explain how to select response on the form. Write in simple and easily understandable language.

9) Formatting the scale: Randomly order the selected items. Use letters to indicate choices such as SD, D, U, A, SA.

10) Validity: Validity is the extent to which the measure provides an accurate representation of what one is trying to measure. Validity includes both systematic and variable error components. A systematic error, also known as bias, is one that occurs in a consistent manner each time something is measured. For an example “A biased question would produce an error in the same direction each time it is asked”. Such an error would be systematic error. A variable error is one that occurs randomly each time something is measured. For an example “A response that is less favorable than the true feeling because the respondent was in a bad mood (temporary characteristic) would not occur each time that individual’s attitude is measured”. In fact, an error in the opposite direction (overly favorable) would occur if the individual were in a good mood. This represents a variable error.19

11) Reliability: The term reliability is used to refer to the degree of variable error in a measurement. Reliability is the extent to which a measurement is free of variable errors. This is reflected when repeated measures of the same stable characteristic in the same objects show limited variation20

NEED FOR EVALUATION MODELS

NEED FOR EVALUATION MODELS

Evaluation is an integral part of most instructional designs. Evaluation tools and methodologies help to determine the effectiveness of instructional interventions. There are different types of evaluation models.

Definition:

“ Evaluation models either describe what evaluators do or prescribe what they should do” . The evaluation model is systematic approach that will guide in measuring the efficiency and effectiveness of a training, a course or an educational program.

Different models target different things but in general, they look at things such as:

· Was the training successful?

· What did the participants learn?

· Did the participants use what they learned on-the-job?

· What was the impact on the organization?

· Was the training a good investment?

· Did the training offer value for money?

· Could the training be improved?

· Provides a systematic method to study a program, practice, intervention, or initiative to understand how well it achieves its goals.

· Systematic frameworks for investigating and analyzing the effectiveness of training or learning journeys.

· Suggest improvements for continued efforts

· Seek support for continuing the program

· Gather information on the approach that can be shared with others

· Help determine if an approach would be appropriate to replicate in other locations with similar needs

PROCEDURE OF CONSTRUCTION OF SPECIAL APTITUDE TESTS

PROCEDURE OF CONSTRUCTION OF SPECIAL APTITUDE TESTS

Aptitude is defined as the natural, learned or acquired ability to do something. It is the readiness of an individual based on his willingness and ability to acquire some skill or knowledge particular to certain activities. Knowledge of aptitude can help us to predict an individual’s future performance. Hence, an aptitude assessment looks at one or more clearly defined and relatively homogenous segments of ability. They assess a test taker’s potential for learning or ability to perform in a new situation based upon their cumulative life experiences. An aptitude test is designed to assess what a person is capable of doing or to predict what a person is able to learn or do given the right education and instruction. It represents a person's level of competency to perform a certain type of task. Such aptitude tests are often used to assess academic potential or career suitability and may be used to assess either mental or physical talent in a variety of domains.

Examples of Aptitude Tests

Some examples of aptitude tests include:

· A test assessing an individual's aptitude to become a fighter pilot

· A career test evaluating a person's capability to work as an air traffic controller

· An aptitude test is given to high school students to determine which type of careers they might be good at

· A computer programming test to determine how a job candidate might solve different hypothetical problems

· A test designed to test a person's physical abilities needed for a particular job such as a police officer or firefighter

Students often encounter a variety of aptitude tests throughout school as they think about what they might like to study in college or do for as a career someday. High school students often take a variety of aptitude tests designed to help them determine what they should study in college or pursue as a career. These tests can sometimes give a general idea of what might interest students as a future career. For example, a student might take an aptitude test suggesting that they are good with numbers and data. Such results might imply that a career as an accountant, banker, or stockbroker would be a good choice for that particular student. Another student might find that they have strong language and verbal skills, which might suggest that a career as an English teacher, writer, or journalist might be a good choice. Aptitude tests may be single or specialized as per the skill or ability such as artistic ability, manual dexterity, clerical skills and motor abilities or maybe general.

Structure of a basic Aptitude Tests

Usually, basic aptitude tests are divided into sections that gauge numerical ability, logical reasoning, verbal comprehension, spatial awareness and cognitive ability. These sections can vary, depending on the qualities sought by an employer or institute. However, the elements common to most versions of ability and aptitude tests are listed below.

Most of these tests contain multiple-choice questions.

There can also be mathematical equations and true/false question formats.

The questions are designed to assess a candidate’s ability to process information quickly and devise accurate solutions/answers.

Candidates are expected to finish every section within a fixed duration.

Special Aptitude Tests

Special aptitude tests are those designed to look at an individual's capacity in a particular area. For example, imagine that a business wants to hire a computer programmer to work for their company. They will likely look at a range of things including work history and interview performance, but they might also want to administer an aptitude test to determine if job candidates possess the necessary skill to perform the job. This special aptitude test is designed to look at a very narrow range of ability: how skilled and knowledgeable the candidate is at computer programming.

The different Special Aptitude Tests are as under:

(A) Mechanical Aptitude Test: Like intelligence, mechanical aptitude is also made up of many components.

A number of tests are available for measuring mechanical aptitude for a fairly large field of occupations rather than for a single occupation.

· Minnesota Mechanical Assembly Test.

· Minnesota Spatial Relations Test.

· Minnesota Paper Form Board

· Johnson O’Connor’s Wiggly Blocks.

· Sharma’s Mechanical Aptitude Test Battery.

· Stenguist Mechanical Aptitude Tests etc.

This is usually include the following items

· Asking the subject to put together the parts of mechanical devices

· Asking him to replace cutouts of various shapes in corresponding spaces on a board

· Solving geometrical problems

· Questions concerning the basic information about tools and their uses

· Questions relating to the comprehension of physical and mechanical principles

For instance, the Bennett mechanical comprehension test has 60 items in pictorial form. They present mechanical problems arranged in order of difficulty and involve comprehension of mechanical principles found in ordinary situations.

(B) Clerical Aptitude Tests: Like the mechanical the clerical aptitude is also a composite function. According to Bingham, it involves several specific abilities namely,

· Perceptual ability. The ability to register words and numbers with speed and accuracy.

· Intellectual ability. The ability to grasp the meaning of words and symbols.

· Motor ability. The ability to use various types of machines and tools like a typewriter, duplicator, cylostone machine, etc.

A number of tests are available for measuring clerical aptitude:

· Minnesota Clerical Aptitude Test.

· General Clerical Aptitude.

· The Detroit Clerical Aptitude Examination.

· P.R.W. Test.

· Orissa Test of Clerical Aptitude.

· Clerical Aptitude Test

(C) Tests of Artistic Aptitude: Some tests have been devised to measure the artistic aptitudes.

Some such tests are listed below:

i. Graphic Arts Test: These tests are devised to discover the talent for graphic art

ii. Musical Aptitude Tests:

iii. Literary Aptitude Tests:

(D) Professional Aptitude Tests: These tests primarily measure aptitude for different professions. Such tests are administered before admission into professional institutions like medical, legal, engineering institutions. There are many tests to measure aptitude in medicine, science, mathematics, law, engineering, teaching etc.

(E) Scholastic Aptitude Tests: These tests measure the scholastic aptitudes. Some examples of such tests are Scholastic Aptitude Tests and Graduate Record Examination.

(F) Other Tests like Motor Dexterity Tests: Other Tests like Motor Dexterity Tests, Sensory

Tests, Visual Tests and Auditory Tests.

Test construction.

The process of constructing aptitude tests involves a rather technical sequence combining ingenuity of the psychologist, experimentation and data collection with suitable samples of individuals, the calculation of quantitative indexes for items and total test scores, and the application of appropriate statistical tests at various stages of test development. Some of the indexes applied in the construction phase are difficulty levels, the proportion of responses actually made to the various alternatives provided in multiple-choice tests, and the correlation of item scores with total test scores or within an independent criterion. A well-developed aptitude test goes through several cycles of these evaluations before it is even tried out as a test. The more evidence there is in the test manual for such rigorous procedure the more confidence we can have in the tests.

There are other problems that generally must be considered in evaluating test scores. Before a test is actually used, a number of conditions have to be met. There is a period of “testing the tests” to determine their applicability in particular situations. A test manual should be devised to provide information on this. Furthermore, there is the question of interpreting a test score.

Standardization. The concept of standardization refers to the establishment of uniform conditions under which the test is administered, ensuring that the particular ability of the examinee is the sole variable being measured. A great deal of care is taken to ensure proper standardization of testing conditions. Thus, the examiner’s manual for a particular test specifies the uniform directions to be read to everyone, the exact demonstration, the practice examples to be used, and so on. The examiner tries to keep motivation high and to minimize fatigue and distractions.

Reliability. One of the most important characteristics of a test is its reliability. This refers to the degree to which the test measures something consistently. If a test yielded a score of 135 for an individual one day and 85 the next, we would term the test unreliable. Before psychological tests are used they are first evaluated for reliability. This is often done by the test-retest method, which involves giving the same test to the same individuals at two different times in an attempt to find out whether the test generally ranks individuals in about the same way each time.

Validity. An essential characteristic of aptitude tests is their validity. Whereas reliability refers to consistency of measurement, validity generally means the degree to which the test measures what it was designed to measure. A test may be highly reliable but still not valid.

The selection ratio. Another important factor affecting the success of aptitude tests in personnel selection procedures is the selection ratio. This is the ratio of those selected to those available for placement. If there are only a few openings and many applicants, the selection ratio is low; and this is the condition under which a selection program works best.

GOAL ATTAINMENT MODEL

GOAL ATTAINMENT MODEL

Ralph W Tyler (1950) proposed a goal attainment model. Tyler describes education as a process in which three different foci should be distinguished - education objectives, learning experiences and examination of achievements. According to him, evaluation means an examination of whether desired educational objectives have been attained or not. Tyler model has been used mainly to evaluate the achievement level of either individuals or a group of students. The evaluator working with this model is interested in the extent to which students are developing in the desired way. The relationship between educational objectives and students achievement constitutes only apportion of the model. The study of other relationships described in the model also form part of curriculum evaluation.

Tyler' s goal attainment model or sometimes called the objective centered model is the basis for most common models in curriculum design, development and evaluation.

Major parts of Tyler model

The Tyler model is comprised of four major parts. These are:

1) Defining objectives of the learning experience

2) Identifying learning activities for meeting the defined objectives

3) Organizing the learning activities for attaining the defined objectives

4) Evaluating and assessing the learning experiences.

The Tyler Model begins by defining the objectives of the learning experience. These objectives must have relevancy to the field of study and to the overall curriculum. Tyler's model obtains the curriculum objectives from three source

· The student

· The society, and

· The subject matter.

Nature and characteristics of Tyler's objective model

« The nature of Tyler's objective model is that it evaluates the degree to which an instructional program s goals or objectives were achieved.

« The model mainly involves the "careful formulation according to three educational goals (the student, the society, and the subject matter) and two goal screen (a psychology of learning and a philosophy of education).

« The result goals are then transformed into measurable objectives.

With Tyler's evaluation, the evaluator can determine the level to which the objective of the program are achieved. Attained objectives show successful instructional education program. The objectives can be changed during the implementation of the program or the program may not have clear objectives.

« Tyler' s objectives model can be only used to evaluate those with clear and stable objectives.

Criticism of Tyler's goal attainment model

« The first criticism is that, it is difficult and time consuming to construct behavioural objectives. Tyler's model relies mainly on behavioural objectives. The objectives in Tyler's model come from three sources (the student, the society, and the subject matter) and all the three sources have to agree on what objectives needs to be addressed. This is a cumbersome process. Thus, it is difficult to arrive to consensus easily among the various stakeholders groups.

« The second criticism is that, it is too restrictive and covers a small range of student skills and knowledge.

« The third criticism is that Tyler' s model is too dependent on behavioural objectives and it is difficult to declare plainly in behavioural objectives the objectives that covers none specific skills such as those for critical thinking, problem solving, and the objectives related to value acquiring processes.

« The fourth and last criticism is that the objectives in the Tyler's model are too student centered and therefore the teachers are not given any opportunity to manipulate the learning experiences as they see fit to evoke the kind of learning outcome desired.