Seeing rare birds where there are none: self-rated expertise predicts correct species identification, but also more false rarities

The use of crowdsourced data is growing rapidly, particularly in ornithology. Citizen science 12 greatly contributes to our knowledge, however, little is known about the reliability of data 13 collected in that way. We found, using an online picture quiz, that self-proclaimed expert 14 birders were more likely to misidentify common British bird species as exotic or rare 15 species , compared to people who rated their own expertise more modestly. This finding 16 suggests that records of rare species should always be considered with caution even if the 17 reporters consider themselves to be experts. In general, however, we show that self-rated 18 expertise in bird identification skills is a reliable predictor of correct species identification . 19 Implementing the collection of data on self-rated expertise is easy and low-cost. We therefore 20 suggest it as a useful tool to statistically account for variability in bird identification skills of 21 citizen science participants and to improve the accuracy of identification data collected by 22 citizen science projects.


Introduction 25
The use of crowdsourced data is growing rapidly (1,2), particularly in ornithology (3). Citizen 26 science data collection (4) greatly contributes to our knowledge of species distribution, 27 population dynamics (4), the assessment of extinction risks (5) and to conservation decision 28 making (6). However, while the correct identification of species is fundamental for the 29 reliability of these data (7) little is known about the variation in the identification skills of the 30 contributors and the so-introduced error. Visual identification is to date still the most efficient 31 and reliable method of most bird species identification (8), yet it relies on the expertise and skill 32 of the observer. Thus, reliance on non-expert species identification, for example in citizen 33 science projects, means that errors will be made. Identification errors can have serious 34 consequences (9). As an example, misidentification of a species that needs to be managed by 35 culling for another one that is endangered (Takahe, Porphyrio hochstetteri) can lead to wasted 36 conservation efforts (10). As citizen science data often forms the basis for conservation policies 37 and management plans (6), it is imperative to quantify the extent of these errors. Concerningly, 38 however, few such studies have been conducted. One such rare example is a study showing that 39 expert and non-expert bumblebee species identification are similarly reliable (11), yet 40 experience predicts correct species identification in mussels (7). However, the validity of bird 41 species identification skills remains largely unexplored, and most citizen science projects on 42 birds do not collect information on participants (but see (12)). This is even despite many 43 hobbyist ornithologists contributing to large citizen science projects (13,14). Yet, the popularity 44 of birdwatching (15,16) and the number of people able and willing to contribute to bird citizen 45 science projects bears an immense potential for ornithological research (17). Here, we provide 46 the, to the best of our knowledge, first quantification of visual bird species identification 47 accuracy, with an exceptionally large sample size. We test the hypothesis that people who self-48 rate their expertise in identifying common bird species higher are also able to correctly identify 49 more birds from pictures. We used an online bird identification questionnaire that presented 50 2,697 people four pictures of each of six common British bird species. 51 Results 52

Descriptive statistics 53
Our online bird identification questionnaire resulted in 64,728 identification attempts by 2697 54 potential citizen scientists. We asked participants to rate their own expertise on a five-point 55 scale -self-rated expertise (1 = 'Novice', 2 = 'Little experience with wild birds (feeders in 56 garden, etc.)', 3 = 'Intermediate', 4 = 'Experience with a wide range of British species, 57 especially common birds', 5 = 'Experience with most species in Britain (including waders, 58 gulls, etc.) and abroad (e.g. Western Palearctic)'). We also asked participants whether they had 59 externally certified expertise (e.g. reporting as being trained and licensed as a bird ringer), and 60 of their previous experience in bird surveys. Overall, 78% of the pictured birds were correctly 61 The probability of an incorrect answer decreased statistically significantly with higher self-73 rated expertise (Table 1). Self-rated novices (1 on the scale) correctly identified on average 74 35% of the pictures, while self-rated experts (5 on the scale) correctly identified 95% of all 75 pictures (Fig. 1). While having externally certified expertise and previous experience in bird 76 surveys statistically significantly predicted the probability of correctly identifying a species in a picture, self-rated expertise was a more reliable and precise predictor of correct species 78 identification (Table 1). 79

Incorrect identifications 87
Inaccurate answers included the acknowledgement of not knowing the answer, and incorrect 88 identifications. Most incorrect identifications referred to other species common in Britain. 89 Surprisingly, despite the title of the questionnaire "Common British birds: identification quiz" 90 and the introductory text explicitly stating that we sought to assess identification skills of 91 common British birds, 113 participants (4.2%) identified at least one of the birds in the 92 pictures as a rarity in Britain, or as a species that has never been reported as wild in Britain 93 (i.e. exotic species, Fig. 2A). Notably, participants who suggested rarities or exotics rated 94 their expertise statistically significantly higher than people who did not suggest rare or exotic 95 bird species, and were also more likely to use references such as bird guide books or websites 96 for help (Fig. 2B). People with higher self-rated expertise are expected to be more familiar 97 with a greater number of species, and therefore may be expected to consider more possible 98 species compared to novices. 99 The total number of participants who identified at least one species in a picture as a rare or 115 exotic species (black line, right y-axis). The percentage of participants using reference material 116 like a bird guide book (left y-axis) was higher among participants that inaccurately identified 117 rare or exotic bird species (dark grey bars), than among those that did not identify rare or exotic 118 bird species (light grey bars). Parameter estimates (95CI) of a binomial linear model with 119 rare/exotic species suggested (1 = yes) as response variable: bintercept = -5.23 (-6.81--4.08), bSelf-120 rated Expertise = 1.34 (0.78-1.87), bUsed reference 0.41 (0.20-0.63), N = 2697 participants. Externally 121 certified expertise and previous experience in bird surveys were not associated with seeing rare 122 or exotic species. 123 124 125 Discussion 126 We found that while in general, self-rated expertise in identifying common bird species did 127 predict the number of correctly identified images, self-rated experts were more likely to 128 identify a common bird species as a rare or exotic species than those people who rated their 129 own expertise more modestly. The incentive of "ticking" (bird watching terminology 130 describing one's first observation of a species) as many species as possible, for a potentially 131 ever growing personal list of observed species, appears to be a common behaviour in 132 birdwatching, although this has not been quantified. There is, to the best of our knowledge, 133 only one study that found no impact of the incentive of personal species list growth on the 134 number of reported false positives, for acoustic bird species identification (18). However, 135 overconfidence certainly could explain the report of a Scottish Crossbill (Loxia scotica) in our 136 dataset as this species is not identifiable by sight alone (19). Future research should therefore 137 aim at understanding the underlying causes of the different identification patterns among the 138 different expertise levels. 139 In conclusion, self-rated expertise is a good indicator of performance and can provide valuable 140 information to any citizen science project involving species identification. We suggest that 141 citizen science projects should evaluate self-rated expertise with a simple questionnaire. The 142 so-collected data can then be used to statistically account for variation in observer expertise, 143 for instance, by using a weighted statistic. We suggest that such an approach should be standard procedure in any citizen science or crowd-sourced project that relies on species identification, 145 to increase precision, reproducibility, and generality of our science. 146 147

Ethics statement 149
Approval for this study was granted by Prof Barraclough, as representative for the Imperial 150 College Research Ethics Committee. All research was performed in accordance with relevant 151 guidelines and regulations. All response forms were anonymous and formal and informed 152 consent was obtained. 153

Questionnaire 154
The complete questionnaire is provided as Online Supplementary  sourced from the sighting collaborative website observations.be. The plumage differences 162 between British and Belgian birds from the species we selected are negligible (20). We also 163 included one drawing per species that was similar to those presented in bird guide books. The 164 drawings were sourced from the RSPB website with written permission from the artist, Mike 165 Langman. All participants were informed that the questionnaire only concerned common birds 166 in Great Britain. It was not possible to zoom in on the pictures. 167

Participant sourcing 168
Using newsletters ("BTO BirdTrack" and "Wildlife in Ascot"), and social media (Facebook 169 and Twitter), participants were presented a short explanation of the aims of the study and a 170 clarification that all levels of expertise are relevant. The questionnaire was shared on specific 171 Facebook groups targeted to the topic (e.g. UK Bird Identification, Birding UK and Ireland, 172 etc). 173

Data coding 174
Species identifications were submitted as free text answers and subsequently checked for 175 spelling mistakes and synonyms and coded using a numeric code (correct, inaccurate). All 176 answers were coded twice and cross-checked to account for human error during coding by NB. 177 Correct species names were accepted even if followed by a question mark, inaccurate sex or 178 similar. Only for the House Sparrow (Passer domesticus) was the genus name "sparrow" 179 accepted as a correct answer. 180

Descriptive statistics 181
Of all 2697 participants, 66 rated their own expertise as 'Novice' (coded as 1), and 333 182 described their own expertise as 'Little experience with wild birds (feeders in garden, etc.)' 183 (coded 2). 793 participants considered their own expertise as 'Intermediate' (coded 3), and 184 1,072 rated themselves as having 'Experience with a wide range of British species, especially 185 common birds' (coded 4). Finally, 433 participants considered themselves experts, described 186 as 'Experience with most species in Britain (including waders, gulls, etc.) and abroad (e.g. 187 Western Palearctic)' (coded 5). We then asked whether participants had previous experience in 188 bird surveys (of which 1,277 (47.3%) participants answered positively) and whether they had 189 been externally certified. We found that 220 participants (7.4%) had either a ringing licence or 190 were a validator on a sighting collection website or similar. 191 93.3% of all participants were from Britain, 6.1% from other European countries, 0.4% were 192 from outside Europe. Of all participants, 1661 were male, 1018 were female, with 18 193 participants scored as neither or do not want to say. Only in the self-rated expertise category 4 194 ('Experience with a wide range of British species, especially common birds') was there a 195 significant difference in correctly identifying species in pictures between men and women (two-196 sided t = -2.84, df = 1068, p = 0.005, all gender comparisons in all other self-rated expertise 197 categories 0.96 > t > -1.68, and p > 0.10). However, note that the data has, due to the large 198 sample size, a high statistical power to discriminate small effect sizes. Here, the effect size was 199 minimal and potentially not biologically important, as women in self-rated expertise category 200 4 scored on average 20.1 correct out of 24 shown pictures, while men scored 20.7 correctly. 201

Statistical analysis 202
To test whether self-rated expertise, externally certified expertise, and previous survey 203 experience predicted the probability of correctly identified bird pictures, we used a generalised 204 linear mixed model (GLMM) with a logit link function. The response variable was either a 205 correctly identified (0) or an inaccurately identified (1) species per picture. The five-level self-206 rated expertise (1=non-expert, 5=expert) was modelled as a fixed covariate. Externally certified 207 expertise and previous experience were added as two-level fixed factors. Some species may be 208 easier to identify than others. We indeed found that, on average, starlings were least likely to 209 be correctly identified (44% inaccurate identifications), followed by green finch (27%), 210 chaffinch (21%) and house sparrow (18%). Robins (11%) and blue tits to be most likely to be 211 correctly identified (9%). Therefore, we modelled species as a random effect. To account for 212 variation between participants and to account for pseudo-replication, we modelled participant 213 ID as a random effect on the intercept. We accounted for the fact that some pictures may have 214 been easier to identify than others by modelling picture ID as a random effect on the intercept. 215 We found a statistically significant difference between the probability to correctly identify a 216 drawing and a photograph (c 2 -test: c 2 = 114.8, df = 1, p < 0.0001). Note that the low p-value 217 stems from the large sample size and thus high statistical power to detect small effects. Indeed, 218 the actual difference between both categories was minimal (% inaccurately identified: photos 219 21.9%, drawings 21.0%) and likely irrelevant. However, the random effect of picture ID 220 statistically corrects for any difference between photos and drawings. We used Bayesian Mixed 221 Models and R package MCMCglmm (21) to model GLMMs, these account well for over-222 dispersion in the data. We used an inverse Wishart prior for the random effects. The residual 223 variance is not identifiable when using binary data, therefore, we used the prior to fix it to 1. 224 The models were run with 75,000 iterations and the default burn-in parameter. We report 225 posterior means as parameter estimates, and 95% credible intervals. We used a t-test to test 226 whether people who reported rare or non-British birds had higher self-rated expertise. All 227 analyses were conducted in R version 3.5.0 (22