Evolution of Evaluation

 Evolution of Evaluation

 “Evaluation is a very young discipline - although it is a very old practice.” - (Scriven, 1996) In this chapter, an overview of the global development evaluation scenario is presented. To understand the current scenario, an idea of the historical development is necessary. Hence, it is presented next.

Global Picture

This section describes  how  evaluation evolved as a  field  and important organisations and journals in the field of evaluation at the international level. Most of the published literature in this field comes from United States of America and some from Europe, thus there is a clear western bias in the documentation of history and important organisations and journals.

History of evaluation

History of evaluation is as old as human activity, Humans (a) identify a problem/ issue, (b) devise alternatives to tackle it,  (c) evaluate the alternatives, and then (d) adopt those that results suggest will reduce the problem satisfactorily (Shadish & Luellen, 2005). Shadish and Luellen give examples of earliest documented evaluations from personnel evaluation in China over 4000 years ago and evaluation of Hebrew diet in Bible. Program evaluation is divided into 7 development periods in the western world. First, the period prior to 1900, the Age of Reform; second, 1900-1930, the Age of Efficiency; third, 1930-1945, called the Tylerian Age; fourth, 1946 -1957, called the Age of Innocence; fifth, 1958-1972,  the  Age  of  Development;  sixth,  1973-1983,  the Age  of  Professionalization;  and seventh, from 1983-2000 the Age of Expansion and Integration (Hogan, 2007). In  Age of Reforms,  earliest  documented  evaluations  are  educational  and  production processes.  In  Age of Efficiency, scientific  management  based  on  observation,  measurement,analysis, and efficiency became prominent, objective-based tests were used to determine quality of educational instruction.  In the Tylerian Age, criterion-referenced testing  based  on internal comparison of objectives and outcomes was started. World War II was followed by a period of great  growth  when  accountability  of  national  expenditure  was  ignored,  thus  this  period  is labelled as Age of Innocence.  Till this period,  most literature on  evaluation is on educational evaluation.  In  USA,  with  the  Elementary  &  Secondary  Education  Act  introducing supplementary programs to support education  of disadvantaged students, program evaluation as we know started in the Age of Development. In the Age of Professionalisation, many journals and  university  courses  on  evaluation  were  started  and  evaluation  established  as  a  formal independent  professional  field.  With  increase  in  aid  funding,  in  the  Age of Expansion and Integration, professional associations and evaluation standards were established (Hogan, 2007). In the new millennium, the focus is on capacity development and building institutions for evaluation where organisations like United Nations Evaluation Group and World Bank play a major role. Instead of multiple agencies following multiple standards, there is a move towards a consultative standardisation. This, I am terming as the Age of Consolidation (2000-current). In past few decades, following trends emerged in program evaluation (Hogan, 2007):

·        Increased priority and legitimacy of internal evaluation.

·        Expanded use of qualitative methods  and  a  shift toward mixed quantitative-qualitative methods instead of depending exclusively on either.

·        Increased acceptance of and preference for multiple-method evaluations.

·        Introduction and development of theory-based evaluation.

·        Increased concern over ethical issues in conducting program evaluations and increased use of evaluation to empower program stakeholders.

·        Increased use  of  program  evaluation  within  business,  industry,  foundations, and other agencies in the private and non-profit sector.

·        Increased options that evaluators should also be advocates for the programs they evaluate.

·        Advances in technology, communication, and ethical issues.

·        Modifications in evaluation strategies to accommodate increasing trends of government decentralization and delegation of responsibilities to state/provinces and localities

 

 

  International organisations in evaluation

 In  the  field  of  development  program  evaluation,  few  organisations  are  widely recognised. These organisations, by their affiliation, are the leaders in the industry. These are:

·        United Nations Evaluation Group, a platform for United Nations Evaluation Offices across units

·        Independent Evaluations Group of World Bank

·        International Organisation for Cooperation in Evaluation

·        International Development Evaluation Association

·        American Evaluation Association

·        European Evaluation Society

The first two make the evaluating agencies for a large amount of development aid, while IOCE and IDEAS bring together different evaluation organisations. The last two are academic bodies which bring together the leading evaluation practitioners and theorists in the world. United Nations Evaluation Group was first  established in January 1984 as the ‘Inter-Agency  Working  Group  on  Evaluation’  (IAWG),  a  part  of  UN  consultative  group  on programme questions (CCPOQ). This is a group of heads of UN evaluation offices to discuss system wide evaluation issues. UNEG’s initial work was on designing, testing, and introducing monitoring  and  evaluation  system  for  UN  operations  across  specialised  agencies,  funds, programmes,  and  affiliated  organisations.  UN  Development  Programme  (UNDP),  which funded  most  UN  operations,  provided  the  secretariat  and leadership  for  the  Group.  It  was renamed  to  UNEG  in  2003  (UNEG  Secretariat,  2008).  UN  also  has  an  Office  of  Internal Oversight  Services,  established  in  1994  by  the  General  Assembly.  The  office  assists  the Secretary-General in his oversight responsibilities in  respect of the  resources and staff  of the organization through the audit, investigation, inspection, and evaluation (OIOS, 2018). The Independent Evaluation Group (IEG) is independent of the Management of World Bank  Group  and  reports  directly  to  the  Executive  Board  (IEG,  2018). It is charged with objectively evaluating the activities of International Bank for Reconstruction and Development (IBRD),  International  Development Association  (IDA;  together, the  World  Bank),  work  of International  Finance  Corporation  (IFC),  and  Multilateral  Investment  Guarantee  Agency’s (MIGA)  guarantee  projects  and  services  to  provide  accountability,  course  corrections,  and avoid repetition of past mistakes in meeting the agenda of making the world poverty free.

 World Bank project evaluations began in 1970 through Operations Evaluation Unit in Programming and Budgeting Department. In 1973, it was renamed the Operations Evaluation Department, and became independent from bank management. IFC established an evaluation unit in 1984, and in 1995 the unit increased its independence and was renamed as Operations Evaluation Group. MIGA created an evaluation office in 2002. In 2006 the Board of the Bank Group integrated these into a single unit, Independent Evaluation Group (Wikipedia, 2017). The International Organisation for Cooperation in Evaluation is a UNEG supported moment  that  represents  international,  national,  sub-national,  and  regional  Voluntary Organizations  for  Professional  Evaluation  (VOPEs).  It  strengthens  international  evaluation through the exchange of evaluation methods and promotes good governance and recognition of the value evaluation has  in improving peoples’ lives  (IOCE,  2018). The EvalPartners  group, managed by UNICEF and IOCE, is supported by various partners, including DevInfo, IDEAS, UN Women, UNEG, UNDP, ILO, IDRC, Rockfeller Foundation, Better Evaluation, ReLAC, Preval,  Agencia  Brasileira  de Avaliacao,  SLEvA  and  IPEN,  all  working  together  for  SDG evaluation (Eval Partners, 2017). International Development Evaluation Association was established in 2002 as a global professional association for active development evaluators. It aims to improve and extend the practice  of  development  evaluation  by  refining  knowledge,  strengthening  capacity,  and expanding networks, especially in developing countries (IDEAS, 2018). American Evaluation Association (1986) and European Evaluation Association (1992) were  established,  to  promote  evaluation  use  and  enrich  its  theory  and  practice  in  the  two continents.

 Global Evaluation Agenda (GEA) 2016-2020

 To support monitoring and evaluation for achieving the 2030 Agenda for Sustainable Development,  United  Nations  adopted  the  resolution  69/237  on  19th  December  2014  for “building capacity for the evaluation of development activities at the country level”. This was a step towards building global cooperation for evaluation, year 2015 being already declared as the International Year of Evaluation (EvalYear) at the 3rd International Conference on National Evaluation Capacities at São Paulo, Brazil, in September 2013. The  idea  behind  this  was  to  23  advocate and promote evaluation and evidence-based policy making at international, regional, national, and local levels (EvalPartners, 2016). The  Global  Evaluation Agenda  (GEA)  2016-2020 is  the  first ever  long-term  global vision  for  evaluation.  The  GEA  was  developed  by  many  global  collaboration,  under  the EvalPartners  umbrella.  The  discussions  around  evaluation  capacities  and  capabilities intensified during the Year of Evaluation in 2015, celebrated at 92-plus events around the world. The Year of Evaluation  culminated in a historic global gathering hosted by the Parliament of Nepal  in  Kathmandu  where  the  GEA  was  launched  and  endorsed  by  various  stakeholders including Governments, Parliaments, civil society, and academia, in an atmosphere of global solidarity and partnership (EvalPartners, 2016). EvalAgenda2020 envisions to strengthen the four essential dimensions of the evaluation system, enabling environment for evaluation, institutional capacities, individual capacities for evaluation, and inter-linkages among these first three dimensions (EvalPartners, 2016).

 

 Development Evaluation in Independent India

System of evaluation  was  conceived  in  India  simultaneously with planned economy. With the launch of first five-year plan in 1951, a need for systemic evaluation was felt, and the first plan deemed that systematic evaluation should become a normal administrative practice in all spheres of public activity and for this the Planning Commission (PC) began developing the evaluation techniques by establishing Program Evaluation Organisation (PEO) for independent evaluations  of  community  projects  and  other  intensive  area  development  programmes (Chandrasekar, 2015). From there, India has come a long way over the past 67 years. Dr S. Chandrasekar served as the  Director of Regional Evaluation Office, at Chennai and then as Adviser at Directorate of Economics and Statistics, Ministry of Agriculture, New Delhi. He wrote an  article about history of Development  Evaluation  in India, published as a web  special  by  Yojana  in  November  2015,  around  the  time  when  a  lot  of  changes  were happening in the Indian evaluation scenario. Most of this section is based on his article and a report by World Bank on M&E system in India (Chandrasekar, 2015) and (Mehrotra, 2013).

 Evolution of evaluation institutions in India

The  history  of  institutionalised  development program evaluation  can  be  divided  into following phases, based on how the Government of India treated its evaluation organisations:

1. Planned economy phase 1952- 1973

2. Neglect phase 1973-1995

3. Resurgence phase 1995-2013

4. New institutions and paradigm phase 2013-current

  Planned economy phase 1952-1973

The PEO was  established  in  October  1952  as  an  independent  organisation under the Planning  Commission  to  evaluate  development  programs  implemented  in  the  first  five-year plan and bring out their successes and failures through reports. Over the first four five-year plans, PEO activities expanded considerably and most states established their evaluation units in the sixties, for state level programs for cross-verification and learning in tandem with PEO. The  scope  of  PEO  extended  to  include  plan  schemes/  programmes  in  sectors  of  health,agriculture and cooperation, rural industries, fisheries, family welfare, rural development, rural electrification, public distribution system, tribal development, social  forestry  etc. Later, PEO also evaluated Centrally Sponsored Schemes (CSS) (Chandrasekar, 2015). PEO, a field-based organisation, had three-tiered structure – Headquarters in New Delhi at higher level, 3 Regional Evaluation Offices at middle level and 20 Project Evaluation Offices at lowest level. Beyond these were the state offices, taking the  total  offices  to  40  and  staff strength to over 500. PEO had relative autonomy as all its offices and the state evaluation offices reported to the Director, PEO. The evaluation reports were a major part of annual conference of State Development Commissioners, enabling follow up actions (Mehrotra, 2013). 

Neglect phase 1973-1995

With the reduction in scope of planning commission activities in early seventies on the recommendations of the Administrative Reforms Commission, PEO started its phase of decline and neglect. While the extent of its work was expanded to include urban areas too, its scope of evaluations was reduced to  operational, financial, and administrative aspects of  schemes and programs, rather  than the overall design of  programs and their impacts. It was recommended that only those studies should be taken  up which could be made  available quickly for  use by line divisions.  This  was  accompanied  by  appointment  of  Indian  Economic  Service  Officers, who are generalists compared to earlier subject specialist academicians, as the head of PEO. Internal PEO functions were merged with Planning Commission in April 1973, reducing it to a division  within  a  department  (Chandrasekar,  2015).  Around  the  same  time,  based on recommendations of Staff Inspection Unit of Ministry of Finance, field offices were reduced from 40 to 27 by the end of the seventies (Mehrotra, 2013). PEO featured briefly in latter plans and received insufficient financial layouts, limiting its ability to bring out good reports on time. Its reports were delayed, didn’t cover program impact & design anymore, and were given less important by the concerned ministry thus, the reducing their use. This in turn reduced the number of studies being done (Chandrasekar, 2015). 

Resurgence phase (1995-2013)

 The resurgence in demand for evaluation  can  be  traced to the late nineties,  when the Planning Commission got involved in design and implementation of social safety net programs to counter  the adverse effects  of economic reforms initiated earlier. Unfortunately, the Fiscal Responsibility and Budget Management Act 2003  ensured  that  the PEO and its field offices were highly understaffed. This  began the practise of outsourcing the studies to social  science research  institutes.  The  PEO  involved  the  ministries  and  subject  matter  expert  groups  in ensuring some actions were taken based on its reports from the ninth plan onwards (1997-2002)  The eleventh five-year plan 2007-2012, stressed on building online MIS for all flagship programs. Development  monitoring unit was setup in Prime Minister’s Office in 2009, and  a Performance Monitoring and Evaluation System (PMES) was created at the cabinet secretariat. The  functions  of  monitoring  and  evaluation  were  being  mixed  together.  A  scheme  named Strengthening Evaluation Capacity was launched in 2006-07, to reduce the financial problems at PEO but it did little to address the administrative and staff problems (Chandrasekar, 2015). During  this  phase  of  resurgence  in  demand  for  evaluation  activities,  mixing  up  of monitoring and evaluation, ignoring plight of PEO, underutilisation of studies, and outsourcing to private institutions without clear policy, were a few grave mistakes  made.  As  a  result,  in 2012, there were 6 regional and 8 project offices left (PEO, 2012). 

New institutions and paradigms phase (2013-current)

A new Independent Evaluation Office was established in the 12th plan with a mandate to “conduct evaluation of plan programmes, especially the large flagship programmes to assess their  effectiveness,  relevance  and  impact.  It  also  has  the  freedom  to  conduct  independent evaluations  on  any  programme  which  has  access  to  public  funding  or  implicit  or  explicit guarantee  from  the  government.”  Instead  of  using  regular  organised  services  available  to government, it proposes to get evaluation done by selected institutes and researchers identified through  tender  processes  (Chandrasekar,  2015).  Not  much  is  known  about  how  IEO  was expected to function and how it was different from the PEO. With the change in regime and dissolution of Planning Commission in 2014, PEO and IEO  have  been  merged  into  Development  Monitoring  and  Evaluation Office (DMEO) in September 2015. In 2017, most field offices were shut down and staff was attached to DMEO at New Delhi (Indian Express, 2017). Even less details are available on official websites about this office compared to PEO (and IEO). The PMES started earlier is now replaced by Pragati dashboard  for  direct  follow-up  by  PMO  for  better  implementation  but  this  misses  any  opportunity  for  evaluations  based  on  the  Results  Framework  documents  prepared  by  the ministries (The Economic Times, 2015).

 Concurrent evaluations

In the resurgence phase, concurrent evaluations were being regularly done by ministries themselves for their programs. For example, National Food Security Mission under Department of Agriculture  and Cooperation, Ministry of Agriculture was  carrying out its own concurrent evaluations in 2010 (NFSM Cell, 2010) and Ministry of Rural Development had a Concurrent Evaluation Office (CEO), set up for  managing  Concurrent  Evaluation  Network  (CENET) of MoRD, in conjunction with IEO. The CEO was closed in July 2016 (PIB, 2016). Concurrent evaluation is  either a formative  or process evaluation, which evaluates all the activities carried out to achieve program objectives, annually. Concurrent evaluations have been  done  in  the  past  too,  an  example  is  the  concurrent  evaluation  of  Integrated  Rural Development  Program  carried  out  by  Department  of  Rural  Development,  Ministry  of Agriculture in 36 districts of the country since October 1985 for at  least a year. As  ordinary evaluations in that era were usually ex post facto, they did not provide remedial measures and mid-term collections, a need for concurrent evaluation was felt. (Saxena, 1987) The  term  concurrent  evaluation  isn’t  common  outside  India,  where  the  term  self-evaluations is used for internal, regular evaluations (UNEP, 2008).

Current Scenario

 Past decade has been very eventful for the evaluation systems in India. IEO was set up and closed, PEO was closed, Results Framework Diagram based PMES was started and closed and DMEO has been started recently. This section captures the current scenario

 

 DMEO at NITI Aayog, New Delhi

 While Development Monitoring and Evaluation Office (DMEO) has been established in 2015 and NITI Aayog has a very functional and updated website, very little information is available  about  it,  in  the  Digital  India  age.  The  little  information  available  is  from  a  few newspaper articles and telephone book of NITI Aayog. While the 2016 contacts document mentions 7 regional DME offices and 8 Project DME offices, the 2018 document mentions no regional or project offices (NITI Aayog, 2018). This change is also hinted at in news in 2017 which mentions that the 15 offices are being shut down and staff called to headquarters in Delhi (Indian Express, 2017). In the  current  set  up,  DMEO  has  a  Director  General at helm, a Joint Secretory,  two Deputy DGs, an under Secretory and staff attached to their offices. On the Technical/ specialist end, there are a few senior Research Officers, Sr. Statistical Officers, a Senior Consultant and many Economics Officers, Consultants, Research Associates and Young Professionals, a total of about 25-26 people. There is some administrative staff as well (NITI Aayog, 2018). In 2016, DMEO called for Expression of Interest by Research Institutions, NGOs, and universities for carrying out evaluation studies. While this call for EoI is available online, the final list is not found on the NITI Aayog website. As per mandate of DMEO, it is expected to get evaluation studies done as requested by various ministries for their programs. This is similar to what PEO and IEO were doing.

Evaluation in Indian states

Evaluation  was  an  integral  component  of every  state’s  planning  and  implementation process while PEO was blooming. States have taken varied path in past few decades from there. While Evaluation is reported just as an activity under the Directorate of Economics and Statistics in Planning Department in most states, Karnataka has an Evaluation authority, in Goa and Sikkim, Evaluation is in the name of the directorate. When we look at the official websites, we  see  that  evaluation  occupies  important position  in  many  states. 

 It is seen that  across  the  states,  evaluation  is a function generally under the Planning Department, which has the Directorate of Economics & Statistics, responsible for all statistical data collection, analysis, and in most states, for monitoring and evaluation functions. Most of these functions started during the third plan period (1961-66) (PEO, 2006). Outsourcing of evaluation studies to competent agencies has been going on for a couple of  decades  and the websites,  developed  in last  10  years mostly,  show  records of processes carried out by various states since 2012-13, under the 12th Five-year plan. Unlike Maharashtra though, very few states refer to the UN guidelines in their empanelment Process. Records  of  how  the  feedback  generated  by  these  studies  is  used  is  poor.  Program Evaluation  Organisation  had  brought  out  one  study  in  2004  and  another  in  2006  titled Development Evaluation in PEO and Its Impact (Vol I and Vol II) which summarise the follow up  actions  taken  based  on  the evaluation studies  done  in  the preceding years  (PEO,  2006). Beyond this, not much is documented

construction of attitude scales

 An attitude is a dispositional readiness to respond to certain institutions, persons or objects in a consistent manner which has been learnt and has become one’s typical mode of response.”

—Frank Freeman

“An attitude denotes the sum total of man’s inclinations and feelings, prejudice or bias, pre-conceived notions, ideas, fears, threats and other any specific topic.”

—Thurstone

 

Steps in construction of Likert attitude scale:

1) Discussion: Informally discuss the issues with the people, extension workers, experts, NGOs and also consult secondary sources. For an example if an investigator wants to develop a scale on attitude of schizophrenic patients among schizophrenic patients among caregivers, discuss the topic within caregivers, staff nurses who is giving care to schizophrenic patients, experts in the field such as psychiatrist, psychologist, psychiatric nurse, psychiatric social workers and NGOs.

2) Review: Review related literature to the particular topic of interest. Refer journals, books, articles and net sources. Literature review helps in the process of item generation for the scale11

3) Writing statements: Based on the discussion and extensive review, collect a set of such statements on the issues. Make the items simple and straight forward so that respondents are able to fill out the scale quickly and easily writing positive and negative statements: Write acceptance or rejection statements, it should imply a different degree of favorable or unfavorable attitude towards the issue in which an investigator intended to assess. Statement or item could be positive or negative. Positive statements should be objective statements which are acceptable by those having the attitude, and just as unacceptable by those having the attitude, and just as unacceptable to those not having it. For an example “I frequently use library resources to go beyond the required reading”. Negative statements should be objective statements which are acceptable to those not having the attitude and just as unacceptable to those having it. For an example “Home work assignments are designed to meet course requirements. It is impractical in time and energy to do more than is required” 13

4) Create an item pool: Continue writing items, both positive and negative, until item pool at least twice the size the size of instrument intended14For an example “If an investigator plan to have 20 items in final scale, then create an item pool of 40 items”.

5) Editing of items: After having collected as many relevant statement as possible, the next step is to go through each item carefully15 Criteria for editing: Avoid the statements which refer to past rather than to present for an example “At one time small pox affected large number of people”. Avoid using statements that are factual or capable of being interpreted as factual. For an example “Using power point slides is a modern medium in educational technology”. Avoid irrelevant to be endorsed by almost every one or no one. For an example “Admitted in private hospital proves expensive”. Avoid irrelevant to the object under consideration. For an example “In future surely there will be treatment for AIDS”. Avoid more than one thought and double negative statements. For an example “Most of the people do not think that AIDS does not cure”. Avoid certain word that may not be understood by the respondents. For an example “Depot injection is more advanced form”. Avoid certain such universals such as all, always, none, never, often etc as these introduce ambiguity. Avoid such words as only, just, merely etc. Avoid biased languages. It is important to avoid using emotional words or phrases in items. Avoid double barrel questions, where the item actually combines two different questions into one. For an example “Do you think that the nursing service department is prompt and helpful?” Any items that include the word “and” should be closely examined to see if it is actually a double barrel question. Avoid non monotonic questions, where people could provide the same answer to a question for different reasons. For an example “Only people in the nursing should be allowed to wear white uniform”. Some could disagree with item either because they feel that nurses should be allowed to wear white uniform or because they feel that no one should be allowed to wear white uniform16

.

6) Rank: After editing, select the items and give rating to the items. Rank orders the items on clarity and potency.Choose an equal number. Five categories are fairly standard. Some scale constructors use seven categories and some prefers four or six response categories with no middle category. All of these seem to work satisfactory17

7) Scoring: The points given for each response depend on whether the statement is positive or negative. The person who strongly agrees with a positive statement gets maximum points. One who strongly disagrees with a positive statement gets the minimum points. For the purpose of scoring, assign the numerical value of 5 to strongly agree, 4 to agree, 3 to undecided, 2 to disagree and 1 to strongly disagree. In case of the item is negative, reverse the order of scoring. 5 to strongly disagree, 4 to disagree, 3 to undecided, 2 to agree and 1 to strongly agree

8) Write instructions which clearly explain how to select response on the form. Write in simple and easily understandable language.

9) Formatting the scale: Randomly order the selected items. Use letters to indicate choices such as SD, D, U, A, SA.

10) Validity: Validity is the extent to which the measure provides an accurate representation of what one is trying to measure. Validity includes both systematic and variable error components. A systematic error, also known as bias, is one that occurs in a consistent manner each time something is measured. For an example “A biased question would produce an error in the same direction each time it is asked”. Such an error would be systematic error. A variable error is one that occurs randomly each time something is measured. For an example “A response that is less favorable than the true feeling because the respondent was in a bad mood (temporary characteristic) would not occur each time that individual’s attitude is measured”. In fact, an error in the opposite direction (overly favorable) would occur if the individual were in a good mood. This represents a variable error.19

 

11) Reliability: The term reliability is used to refer to the degree of variable error in a measurement. Reliability is the extent to which a measurement is free of variable errors. This is reflected when repeated measures of the same stable characteristic in the same objects show limited variation20

NEED FOR EVALUATION MODELS

 NEED FOR EVALUATION MODELS

Evaluation is an integral part of most instructional designs. Evaluation tools and methodologies help to determine the effectiveness of instructional interventions. There are different types of evaluation models.

 

Definition:

“ Evaluation models either describe what evaluators do or prescribe what they should do” . The evaluation model is systematic approach that will guide in measuring the efficiency and effectiveness of a training, a course or an educational program.

 

Different models target different things but in general, they look at things such as:

·       Was the training successful?

·       What did the participants learn?

·       Did the participants use what they learned on-the-job?

·       What was the impact on the organization?

·       Was the training a good investment?

·       Did the training offer value for money?

·       Could the training be improved?

 

 

·       Provides a systematic method to study a program, practice, intervention, or initiative to understand how well it achieves its goals.

·       Systematic frameworks for investigating and analyzing the effectiveness of training or learning journeys.

·       Suggest improvements for continued efforts

·       Seek support for continuing the program

·       Gather information on the approach that can be shared with others

·       Help determine if an approach would be appropriate to replicate in other locations with similar needs

PROCEDURE OF CONSTRUCTION OF SPECIAL APTITUDE TESTS

 PROCEDURE OF CONSTRUCTION OF SPECIAL APTITUDE TESTS

Aptitude is defined as the natural, learned or acquired ability to do something. It is the readiness of an individual based on his willingness and ability to acquire some skill or knowledge particular to certain activities. Knowledge of aptitude can help us to predict an individual’s future performance. Hence, an aptitude assessment looks at one or more clearly defined and relatively homogenous segments of ability. They assess a test taker’s potential for learning or ability to perform in a new situation based upon their cumulative life experiences. An aptitude test is designed to assess what a person is capable of doing or to predict what a person is able to learn or do given the right education and instruction. It represents a person's level of competency to perform a certain type of task. Such aptitude tests are often used to assess academic potential or career suitability and may be used to assess either mental or physical talent in a variety of domains.

Examples of Aptitude Tests

 Some examples of aptitude tests include:

 · A test assessing an individual's aptitude to become a fighter pilot

· A career test evaluating a person's capability to work as an air traffic controller

 · An aptitude test is given to high school students to determine which type of careers they might be good at

 · A computer programming test to determine how a job candidate might solve different hypothetical problems

 · A test designed to test a person's physical abilities needed for a particular job such as a police officer or firefighter

 

Students often encounter a variety of aptitude tests throughout school as they think about what they might like to study in college or do for as a career someday. High school students often take a variety of aptitude tests designed to help them determine what they should study in college or pursue as a career. These tests can sometimes give a general idea of what might interest students as a future career. For example, a student might take an aptitude test suggesting that they are good with numbers and data. Such results might imply that a career as an accountant, banker, or stockbroker would be a good choice for that particular student. Another student might find that they have strong language and verbal skills, which might suggest that a career as an English teacher, writer, or journalist might be a good choice. Aptitude tests may be single or specialized as per the skill or ability such as artistic ability, manual dexterity, clerical skills and motor abilities or maybe general.

 

Structure of a basic Aptitude Tests

Usually, basic aptitude tests are divided into sections that gauge numerical ability, logical reasoning, verbal comprehension, spatial awareness and cognitive ability. These sections can vary, depending on the qualities sought by an employer or institute. However, the elements common to most versions of ability and aptitude tests are listed below.

  • Most of these tests contain multiple-choice questions.
  • There can also be mathematical equations and true/false question formats.
  • The questions are designed to assess a candidate’s ability to process information quickly and devise accurate solutions/answers.
  • Candidates are expected to finish every section within a fixed duration.

Special Aptitude Tests

 Special aptitude tests are those designed to look at an individual's capacity in a particular area. For example, imagine that a business wants to hire a computer programmer to work for their company. They will likely look at a range of things including work history and interview performance, but they might also want to administer an aptitude test to determine if job candidates possess the necessary skill to perform the job. This special aptitude test is designed to look at a very narrow range of ability: how skilled and knowledgeable the candidate is at computer programming.

 

The different Special Aptitude Tests are as under:

(A)  Mechanical Aptitude Test: Like intelligence, mechanical aptitude is also made up of many components.

A number of tests are available for measuring mechanical aptitude for a fairly large field of occupations rather than for a single occupation.

·       Minnesota Mechanical Assembly Test.

·       Minnesota Spatial Relations Test.

·       Minnesota Paper Form Board

·       Johnson O’Connor’s Wiggly Blocks.

·       Sharma’s Mechanical Aptitude Test Battery.

·       Stenguist Mechanical Aptitude Tests etc.

This is usually include the following items

·       Asking the subject to put together the parts of mechanical devices

·       Asking him to replace cutouts of various shapes in corresponding spaces on a board

·       Solving geometrical problems

·       Questions concerning the basic information about tools and their uses

·       Questions relating to the comprehension of physical and mechanical principles

For instance, the Bennett mechanical comprehension test has 60 items in pictorial form. They present mechanical problems arranged in order of difficulty and involve comprehension of mechanical principles found in ordinary situations.

 

(B)  Clerical Aptitude Tests: Like the mechanical the clerical aptitude is also a composite function. According to Bingham, it involves several specific abilities namely,

·       Perceptual ability. The ability to register words and numbers with speed and accuracy.

·       Intellectual ability. The ability to grasp the meaning of words and symbols.

·       Motor ability. The ability to use various types of machines and tools like a typewriter, duplicator, cylostone machine, etc.

A number of tests are available for measuring clerical aptitude:

·       Minnesota Clerical Aptitude Test.

·       General Clerical Aptitude.

·       The Detroit Clerical Aptitude Examination.

·       P.R.W. Test.

·       Orissa Test of Clerical Aptitude.

·       Clerical Aptitude Test

 

(C) Tests of Artistic Aptitude: Some tests have been devised to measure the artistic aptitudes.

Some such tests are listed below:

i. Graphic Arts Test: These tests are devised to discover the talent for graphic art

ii. Musical Aptitude Tests:

iii. Literary Aptitude Tests:

 

(D) Professional Aptitude Tests: These tests primarily measure aptitude for different professions. Such tests are administered before admission into professional institutions like medical, legal, engineering institutions. There are many tests to measure aptitude in medicine, science, mathematics, law, engineering, teaching etc.

 

(E) Scholastic Aptitude Tests: These tests measure the scholastic aptitudes. Some examples of such tests are Scholastic Aptitude Tests and Graduate Record Examination.

 

(F) Other Tests like Motor Dexterity Tests: Other Tests like Motor Dexterity Tests, Sensory

 Tests, Visual Tests and Auditory Tests.

Test construction.

 The process of constructing aptitude tests involves a rather technical sequence combining ingenuity of the psychologist, experimentation and data collection with suitable samples of individuals, the calculation of quantitative indexes for items and total test scores, and the application of appropriate statistical tests at various stages of test development. Some of the indexes applied in the construction phase are difficulty levels, the proportion of responses actually made to the various alternatives provided in multiple-choice tests, and the correlation of item scores with total test scores or within an independent criterion. A well-developed aptitude test goes through several cycles of these evaluations before it is even tried out as a test. The more evidence there is in the test manual for such rigorous procedure the more confidence we can have in the tests.

There are other problems that generally must be considered in evaluating test scores. Before a test is actually used, a number of conditions have to be met. There is a period of “testing the tests” to determine their applicability in particular situations. A test manual should be devised to provide information on this. Furthermore, there is the question of interpreting a test score.

Standardization. The concept of standardization refers to the establishment of uniform conditions under which the test is administered, ensuring that the particular ability of the examinee is the sole variable being measured. A great deal of care is taken to ensure proper standardization of testing conditions. Thus, the examiner’s manual for a particular test specifies the uniform directions to be read to everyone, the exact demonstration, the practice examples to be used, and so on. The examiner tries to keep motivation high and to minimize fatigue and distractions.

Reliability. One of the most important characteristics of a test is its reliability. This refers to the degree to which the test measures something consistently. If a test yielded a score of 135 for an individual one day and 85 the next, we would term the test unreliable. Before psychological tests are used they are first evaluated for reliability. This is often done by the test-retest method, which involves giving the same test to the same individuals at two different times in an attempt to find out whether the test generally ranks individuals in about the same way each time. 

Validity. An essential characteristic of aptitude tests is their validity. Whereas reliability refers to consistency of measurement, validity generally means the degree to which the test measures what it was designed to measure. A test may be highly reliable but still not valid.

The selection ratio. Another important factor affecting the success of aptitude tests in personnel selection procedures is the selection ratio. This is the ratio of those selected to those available for placement. If there are only a few openings and many applicants, the selection ratio is low; and this is the condition under which a selection program works best.

GOAL ATTAINMENT MODEL

  

 

 

 

 GOAL ATTAINMENT MODEL

 

Ralph W Tyler (1950) proposed a goal attainment model. Tyler describes education as a process in which three different foci should be distinguished - education objectives, learning experiences and examination of achievements. According to him, evaluation means an examination of whether desired educational objectives have been attained or not. Tyler model has been used mainly to evaluate the achievement level of either individuals or a group of students. The evaluator working with this model is interested in the extent to which students are developing in the desired way. The relationship between educational objectives and students achievement constitutes only apportion of the model. The study of other relationships described in the model also form part of curriculum evaluation.

 Tyler' s goal attainment model or sometimes called the objective centered model is the basis for most common models in curriculum design, development and evaluation.

 

Major parts of Tyler model

 

The Tyler model is comprised of four major parts. These are:

 

1) Defining objectives of the learning experience

 

2) Identifying learning activities for meeting the defined objectives

 

3) Organizing the learning activities for attaining the defined objectives

 

4) Evaluating and assessing the learning experiences.

 

 

 

 

The Tyler Model begins by defining the objectives of the learning experience. These objectives must have relevancy to the field of study and to the overall curriculum. Tyler's model obtains the curriculum objectives from three source

·             The student

·             The society, and

·             The subject matter.

Nature and characteristics of Tyler's objective model

 

«   The nature of Tyler's objective model is that it evaluates the degree to which an instructional program s goals or objectives were achieved.

 

«    The model mainly involves the "careful formulation according to three educational goals (the student, the society, and the subject matter) and two goal screen (a psychology of learning and a philosophy of education).

 

«    The result goals are then transformed into measurable objectives.

With Tyler's evaluation, the evaluator can determine the level to which the objective of the program are achieved. Attained objectives show successful instructional education program. The objectives can be changed during the implementation of the program or the program may not have clear objectives.

 

«  Tyler' s objectives model can be only used to evaluate those with clear and stable objectives.

 

 

 

 

              Criticism of Tyler's goal attainment model

 

«   The first criticism is that, it is difficult and time consuming to construct behavioural objectives. Tyler's model relies mainly on behavioural objectives. The objectives in Tyler's model come from three sources (the student, the society, and the subject matter) and all the three sources have to agree on what objectives needs to be addressed. This is a cumbersome process. Thus, it is difficult to arrive to consensus easily among the various stakeholders groups.

«       The second criticism is that, it is too restrictive and covers a small range of student skills and knowledge.

«      The third criticism is that Tyler' s model is too dependent on behavioural objectives and it is difficult to declare plainly in behavioural objectives the objectives that covers none specific skills such as those for critical thinking, problem solving, and the objectives related to value acquiring processes.

«   The fourth and last criticism is that the objectives in the Tyler's model are too student centered and therefore the teachers are not given any opportunity to manipulate the learning experiences as they see fit to evoke the kind of learning outcome desired.

Evolution of Evaluation

  Evolution of Evaluation  “Evaluation is a very young discipline - although it is a very old practice.” - (Scriven, 1996) In this chapter...