1 Everyday bat vocalizations contain information about emitter, addressee, context, and behaviorYosef Prat Mor Taub & Yossi YovelAnimal vocal communication is often diverse and structured. Yet, the information concealed in animal vocalizations remains elusive. Several studies have shown that animal calls convey information about to probe an entire vocal repertoire at once. In this study, we continuously monitored Egyptian fruit bats individuals, rather than broadcast. We found that bat vocalizations carry ample information about the identity of the emitter, the context of the call, the behavioral response to the call, and even the calls addressee. Our results underline the importance of studying the mundane, pairwise, directed, vocal interactions of animals. Many animal species use vocal communication1, but the chaotic nature of the acoustics of these vocalizations oen complicates their cataloging into clearly divided types and contexts2. Understanding the encapsulated infor mation in animal vocalizations is central to the study of sociality, communication, and language evolution. Yet, in the research of nonhuman animals, the purpose and meaning of the vocal message oen remain obscure. Researchers of animal communication, seeking homology to semantics, may relate behavioral observations to acoustic measurements, and thus reveal some of the information content of vocal expressions. Indeed, several studies have described cases of vocalizations as functionally referential, i.e. signals which are both specic to a certain context and elicit an appropriate response by a receiver3. However, the requisite for a specic response might be too strict, ignoring many other information bearing vocalizations. Many vocalizations evoke a response according to their context or their receivers internal state6, and not solely based on their acoustics. us, the information capacity of animal vocalizations should still be explored, even when interpreting the behavioral responses is not straightforward7. Identifying context specic calls can be a rst step toward the recovering of meaning in animal communication. For example, many studies indicated the existence of context specic vocalizations in various taxa (e.g., birds8, cetaceans11,12, dogs13, bats14,15, and nonhuman primates16). Notably, the distinctive alarm calls of some species are acoustically unique to dierent types of danger17. Other studies have exemplied the ability to distinguish between emitting individuals based on acoustic features of their vocalizations20, while several studies, mainly in primates, revealed subtle context-dependent acoustic modications of calls in everyday behaviors, such as food-related behaviors32, agonistic interactions33, and long-distance calls29. Bats are social mammals that oen roost in very large colonies, and commonly engage in social interactions. ey rely heavily on vocalizations for social communication14,27,34, yet little is known about the function and informational content of these vocalizations. In this study, we used the extremely social and vocal Egyptian fruit bat ( Rousettus aegyptiacus ) to elucidate some of the huge potential of information capacity in a mammalian vocal communication system. Egyptian fruit bats are social mammals, that aggregate in groups of dozens to thousands of individuals, can live to the age of at least 25 years35, and are capable of vocal learning36. We housed groups of bats in acoustically isolated chambers and continuously monitored them with video cameras and microphones around-the-clock. Over the course of 75 days, we recorded tens of thousands of vocalizations, for many of which (~15,000) we were able to determine both the behavioral context as well as the identities of the emitter and the addressee. Importantly, our dataset covers the complete repertoire used by these bats over a period of 2.5 months, as we continuously recorded the vocalizations of the bats and their behavior.Department of Zoology, Faculty of Life sciences, Tel Aviv University, Tel Aviv, Israel. Correspondence and requests for materials should be addressed to Y.Y. (email: email@example.com)R A P OPEN
2 One might expect most social interactions in a tightly packed group, such as a fruit bat colony, to be aggressive. Indeed, nearly all of the communication calls of the Egyptian fruit bat in the roost are emitted during aggressive pairwise interactions, involving squabbling over food or perching locations and protesting against mating attempts (Supplementary Videos S1S4). ese kinds of interactions, which are extremely common in social animals, are oen grouped into a single agonistic behavioral category in bioacoustics studies. Moreover, in many bioacoustics studies, dierent calls are a-priori separated into categories by human-discernible acoustic features e.g. refs 14 and 37 (but see refs 10 and 38). Such an approach however, was impossible with our data (Fig.1, note how aggressive calls emitted in dierent contexts seem and sound similar). We therefore adopted a machine-learning approach, which proved eective in recognizing human speakers39, and used it to evaluate the information potential of the spectral composition of these vocalizations. We were able to identify, with high accuracy, the emitters of the vocalizations, their specic aggressive contexts (e.g., squabbling over food), and to some extent, the addressees and the behavioral responses to the calls. Our study shows that bat vocalizations contain Figure 1. Distinct types of Egyptian fruit bat vocalizations. (AE ) Spectrograms of calls associated with dierent behaviors. Spectrograms show a time span of 1 sec. and a frequency band of 0 kHz (for A) and 0 kHz (for BE). (A) Mating calls produced by males during mating. (B) Isolation calls produced by pups when le alone in the roost or when detached from their mothers. (C) Seclusion/distress calls emitted mostly by distressed adults which are separated from others. (D) Echolocation clicks lingual click-based sonar signals. (E) Aggression calls the most common call type; produced during the many everyday squabbles between pairs of bats. ese vocalizations are composed of sequences of broad-band multi-harmonic syllables (usually 1-20 syllables). Two examples for each aggression (aggr.) context: (E1) Feeding aggr., (E2) Mating aggr., (E3) Perch aggr., and (E4) Sleep aggr. (F) Occurrence of aggression calls along the day, presented as average calls/10-min. per bat. White/gray background indicates light/dark time of day. e peaks at the beginning/end of the night are associated with the disintegration/reintegration of the day-time sleeping cluster. (G) Occurrence of aggression calls in the four analyzed contexts (assessed according to the proportions of these contexts among annotated recordings).
3 diverse types of information, already detectable in their spectral properties alone, and even among the seemingly similar aggressive contexts.ResultsCollected data. 75 days of continuous recordings of 22 bats (12 adults and 10 pups) yielded a dataset of 162,376 vocalizations, each consisting of a sequence of syllables (Fig.1E). From synchronized videos we identied the emitter, addressee, context, and behavioral response. We included in the analysis 14,863 vocalizations of 7 adult females, for which we had enough data in the analyzed contexts (see SupplementaryTablesS1S3 and Materials and Methods). e spectral content of the vocalizations was represented by Mel-frequency cepstral coecients (MFCC) which are the most common features used in human speech processing, and which were also found useful in the analysis of other animal vocalizations10,40 (and see Materials and Methods). A statistical model was built for each class in each attribute type, e.g. a model for each specic context, each individual bat, each combination of the two, etc. (see illustration in SupplementaryFig.S1, and Materials and Methods for full details). All results refer to classication of test data, which is data that were not used for designing the classier (i.e. in cross-validation). Classication performance is reported as the balanced accuracy (BA), which is the number of correct classications in each class, divided by the number of examples in each class, averaged over all classes. is measurement mitigates biases which could rise from unbalanced class sizes.Who addressed whom recognizing the emitter and addressee. e emitters of the vocalizations were clearly identied with a balanced-accuracy (BA) of 71% (where chance level was 14%, i.e. one out of seven, p 0.01, permutation test, Fig.2A,B, SupplementaryFig.S2), indicating that, potentially, a bat could acoustically recognize who is addressing it. We could also successfully recognize the emitter when incorporating a larger number of emitters (n 15 individuals), using recordings of adult bats from a previous experiment (BA of 56% for 15 individuals, where chance level was 7%, SupplementaryFig.S3; see Materials and Methods), indicating that this result can be extended as data availability increases. We then tested if we could use the same method in order to recognize the addressee (i.e. the target bat) of each vocalization. We found signicant dierences between vocalizations directed to dierent addressees. Figure2C shows the results for the emitter for which we had the largest dataset (F6), with a BA of 50% (chance was 33%, p 0.01, permutation test; see SupplementaryFig.S4 for other emitters. All but one emitter were signicant). Note that we tested addressee classication for every emitter separately thus overcoming the risk of recognizing the emitter and not the addressee. Part of the information regarding the identity of the addressees resulted from the dierences between calls directed to males and calls directed to females. Indeed, the addressees sex could be classied with a BA of 64% (chance was 50%, p 0.01, permutation test; Fig.2D). Nevertheless, when examining each emitter alone, the identity of specic female addressees (when males were excluded from the analysis) could be recovered above chance (for all but 1 emitter, SupplementaryFig.S4), revealing signicant variations between calls directed at dierent recipients. is implies that an eavesdropping bat is theoretically able, to some extent at least, to identify if individual A is addressing individual B or individual C.We tested for dierences between vocalizations produced in the four distinct, most common, aggressive contexts (see examples in Supplementary Videos S1S4): feeding aggression (quarrelling over food), mating aggression (protest of females against males mating attempts), perch aggression (face-to-face aggressive display, where the two bats perch in close vicinity), and sleep aggression (squabbling in the sleeping cluster). Typically in these interactions, a bat aversively violates the personal space of another bat (e.g., by sning it or trying to grab food from its mouth), and the other bats reaction would include an acoustic protest. We could classify the behavioral context with a signicant BA of 61% (chance was 25%, p 0.01, permutation test), suggesting that an eavesdropper can potentially infer, fairly well, the context of the quarrel based on the vocalizations of the individuals involved (Fig.A,B). We controlled for biases that could arise from an unequal representation of the individuals in dierent contexts (e.g. individual A dominates sleeping interactions so that our classier actually recognizes the emitter and not the context). To this end, we also classied the context of the calls for each individual emitter separately, and again received signicant separations between contexts, and with even higher accuracies (an average BA of 75%; SupplementaryFig.S5). Furthermore, we were able to classify all context-emitter pairs (Fig.3C). To illustrate the interplay between emitters and contexts we show the classication models constructed for two emitters (F5, F6) in two dierent contexts (feeding aggr. perch aggr.) in Fig.3(D,E). is 2D projection demonstrates how emitter can be separated along one acoustic dimension while context varies along another one. We also controlled for the possibility that interactions with food (e.g. having food in ones mouth) directly aected the acoustics of the feeding related vocalizations, by analyzing only feeding interactions for which we could verify that the emitter did not have food in its mouth. Again, the results were signicant, with a BA of 64% (SupplementaryFig.S6).What was the outcome of the quarrel? Following the realization that much information resides in the spectral content of bat aggressive calls, we examined whether we could predict the behavioral outcome of each aggressive vocal interaction. To this end, we dened two crude (but easily distinguishable) possible outcomes: departing (each bat goes its own way, or one stays in place while the other leaves), and remaining together (both bats remained close-by aer the vocalization). Surprisingly, we could predict the outcome of a quarrel above chance level, with BA of 62% (chance is 50%, p 0.01, permutation test) (Fig.4). We further attempted to predict the outcome in more detail, namely to predict which individual of the interacting pair will stay and which will leave following the interaction. Again we could predict the detailed outcome signicantly above chance level, with BA of 41% (chance is 25%, p 0.01, permutation test; Fig.S7).
4 outcome. e classication accuracies of the addressee and the outcome were inferior to those of the emitter and context. To make sure that these classications were not a byproduct of an unequal representation of the emitters or contexts, we performed a series of conservative tests classifying the addressee or the outcome for individual emitters in specic contexts. For addressee identication the classication accuracy was signicantly better than chance for 12 out of the 14 tested cases (p 0.01 in 11 cases, p 0.04 in one case, n.s. in 2 cases: p 0.08 and p 0.09, permutation test; SupplementaryTableS5). For the outcome classication the accuracy was signicantly better than chance for 12 out of the 13 tested cases (p 0.01 in 8 cases, p 0.03 in 4 cases, n.s. in one case: p 0.08, permutation test; SupplementaryTableS6), and for the detailed outcome (which included the prediction of who stayed and who le) the accuracy was again signicantly better than chance for all 12 tested Figure 2. Emitter and addressee of the vocalizations. (A) A confusion matrix for the prediction of the emitting individuals. e number in each box indicates the fraction of each true class (row) assigned to each predicted class (column), i.e., each row sums to 1. e total number of vocalizations in each class (N) is indicated to the right of the matrix. Yellow highest value in the matrix, Blue lowest value. (BA 71%, chance 14%, p 0.01) (B) 2D Projection of the vocalizations of dierent emitters (only test data) through the trained models: each vocalization is represented as a 7-dimensional vector the score it received for each model, then LDA is applied (for visualization only) and the two rst linear discriminators are used (see Materials and Methods). Inset: only emitters F5-F7 in an alternative projection. (C) A confusion matrix for the prediction of the addressees of vocalizations produced by bat F6 (BA 50%, chance 33%, p 0.01). (D) A confusion matrix for the prediction of the vocalizations addressed to a male vs. those addressed to a female (BA 64%, chance 50%, p 0.01).
5 cases (p 0.02, permutation test; SupplementaryTableS7). We could also classify the context and the addressees sex for individual emitters. is classication was performed for two emitters for which we had enough data in all 8 combinations, i.e. male and female addressees in 4 contexts (emitter F5: BA 40%, chance is 12.5%, p 0.01; emitter F6: BA 35%, chance is 12.5%, p 0.01, permutation tests; SupplementaryFig.S8). ese results indicate that, independently of the emitter and context, the vocalizations contain information about their addressee and about the outcome of the interaction.DiscussionWe have shown that social vocalizations of fruit bats contain multilayered information. is vast information is available even in calls that would have previously been categorized as belonging to the same crude context of aggressive behavior, and which sound very similar to the human ear. We continuously recorded the bats, and assembled a dataset that represents the full vocal repertoire which was used during the experiment period. is very large dataset of vocal interactions included a careful categorization of the context and participants, and enabled us to uncover the information embedded in everyday aggressive vocalizations. While the natural function of this information is yet to be discovered, we dene the encapsulated information as the potential knowledge that could be extracted from the acoustics alone (see ref. 41 for a discussion on the information in animal signals). To a listener bat, the spectral composition alone already holds information about the identity of the emitter of a specic call and its specic aggressive context. Furthermore, the outcome of the interaction can be predicted above chance level by listening. Interestingly, the spectral composition also carries information about the addressee of a vocalization. Even if the acoustic dierences were merely a result of the emotional state of the animal, an eavesdropper (or an addressee) could potentially use them to extract valuable information42. It is important to note that we used one set of acoustic features for classification. However, many other multi-dimensional spectro-temporal representations can be tested. The bats brain could thus be using some other representation that encapsulates much more information regarding different social aspects. The bat may be able to classify the context of an interaction with higher confidence, based on some acoustic feature which it evolved to use and is yet to be determined. Our analysis is thus probably only a lower bound on what a bat is capable of extracting from aggressive social vocalizations. For example, we did not include any temporal information in our analysis. Temporal features (e.g. amplitude and frequency modulations) exhibit large variability among bat vocalizations, and are known to contribute much to human speech recognition (e.g. ref. 43). Furthermore, a two-step classification process improved the accuracy. For example, the 0.67 0.06 0.12 0.11 0.10 0.68 0.17 0.14 0.10 0.49 0.16 0.14 0.19 0.21 0.59 578 1758 766 2153 NPredicted (context)Feeding Mating Perch SleepTrue (context)Feeding Mating Perch Sleep aggr aggr aggr aggr .aggr. aggr. aggr. aggr..8 .6 .5 .9 .3 .6 .8 .4 .9 .6 .3 .7 .6 .4 .6 .5 .7 .5 .5 .4 .7 .5 .5 614 73 223 76 34 75 706 396 21 35 22 190 145 133 213 280 89 253 553 100 235 234 503 NPredicted (bat ID, context)Mating Perch Sleep Mating Perch Sleep Mating Sleep Feeding Mating Perch Sleep Feeding Perch Sleep Feeding Mating Perch Sleep Feeding Mating Perch SleepTrue (bat ID, context)Mating Perch Sleep Mating Perch Sleep Mating Sleep Feeding Mating Perch Sleep Feeding Perch Sleep Feeding Mating Perch Sleep Feeding Mating Perch Sleep F1 F2 F3 F4 F5 F6 F7 F7 F6 F5 F4 F3 F2 F1Feeding aggr Perch aggr F6 F5 A B C D E Discrim. 1 -3 -2 -1 0 123 Discrim. 2 -2 -1 0 1 2 3 Discrim. 1 -3 -2 -1 0123 Discrim. 2 -2 -1 0 1 2 3 Discrim. 1 -4 -3 -2 -1 01 2Discrim. 2 -3 -2 -1 0 1 2 3 4 Feeding aggr. Mating aggr. Perch aggr. Sleep aggr. 0.07 Figure 3. e context of the vocalizations. (A) A confusion matrix for the prediction of the context (BA 61%, chance 25%, p 0.01). e number in each box indicates the fraction of each true class (row) assigned to each predicted class (column). e total number of vocalizations in each class (N) is indicated to the right of the matrix. (B) Projection of the dierent contexts through the trained models (see Materials and Methods). Inset: only contexts perch aggr. and sleep aggr. in another projection. (C) A confusion matrix for the prediction of the context and emitter together (BA 57%, chance 4%, p 0.01; exact numbers for each cell are given in SupplementaryTableS4). (D and E) Projection of two emitters F5, F6, in two contexts feeding aggr. and perch aggr. (D and E) depict the same plot: red F5, blue F6, dark color feeding aggr. light color perch aggr. e black and grey lines in (D) portray the 0.3 isoline of the 2D histogram (with max. normalized to 1) of feeding aggr. and perch aggr., respectively. e red and blue lines in (E) portray the 0.3 isoline of the 2D histogram (with max. normalized to 1) of F5 and F6, respectively.
6 accuracy of identifying the context of a vocalization increased to 75% when the emitter was classified first, as opposed to 61% when the emitter was unknown. While our results demonstrate that the spectral content of fruit bat vocalizations encapsulates valuable infor mation, it is not straightforward to pinpoint a specic acoustic feature which uniquely denes one vocal class. When examining the four analyzed behavioral contexts, for instance, the distribution of the cepstral coecients (which were used for classication) reveals a widespread overlap between dierent contexts (Fig.5). Yet, each context encompasses specic regions of extreme values which are more unique to it than to the other contexts (Fig.5A,B, dashed rectangles). Since each vocalization contains various values for each feature (changing along and between the syllables), it is sucient that the acoustics of a small portion of the vocalization are outside the overlapping region for its context to be correctly classied. is type of inference, which is based on a distinguishable part of a sequence, may be analogous to classifying a conversation as food-related if the word banana appears somewhere in the sentence. Figure5 depicts examples of syllables that contain such extreme values of cepstral coecients in dierent contexts. One of our surprising results was that vocalizations contain some information about the identity of their addressee. Some studies have suggested the existence of addressee specic calls in dolphins44; others have demonstrated how audience composition may aect primate vocalizations45. Our nding might be akin to a human speaker who uses varying intonation towards dierent listeners (e.g. male vs. female addressees) while using the same words. e importance of vocal communication increases when vision is limited (e.g. in dense forests or underwater), thus it is reasonable to assume that the vocalizations of a social mammal, which roosts in dark caves, will evolve to convey elaborate information about the interactions between individuals. Most of this species vocalizations are emitted during aggressive encounters that include squabbling over food, sleeping spots, or other resources. Importantly, these vocalizations are directed at a specic addressee the interacting peer (as is the case of human speech), rather than broadcast, as is the case of birdsongs, many types of alarm calls, and human non-linguistic vocal behaviors such as laughter, crying, or screams. Such mundane calls, which accompany routine behaviors, account for the great majority of fruit bat communication and govern the vocal communication of many species. However, these calls are oen grouped into one category in acoustic studies due to the diculty of cataloguing them in detail, both acoustically and behaviorally. Inspired by the behavioral usage of human language, we believe that it is extremely benecial to delve into the everyday chitchat among animals, when attempting to track the route of language evolution.Materials and MethodsAnimal retrieval and care. Adult bats (Rousettus aegyptiacus ) were captured in a natural roost near Herzliya, Israel. is roost is regularly inhabited by a colony of 5,000 to 10,000 bats. e bats were kept in acoustic chambers, large enough to allow ight, and fed with a variety of local fruit. All experiments were reviewed and 0.56 0.33 0.44 0.67 586 4643 NPredicted (outcome) Depart Remain together True (outcome) Depart Remain together Figure 4. Predicting the behavioral outcome of the interaction. A confusion matrix for the prediction of the outcome of the vocal interactions. Two possible outcomes were dened: Depart and Remain together (BA 62%, chance 50%, p 0.01). e number in each box indicates the fraction of each true class (row) assigned to each predicted class (column). e total number of vocalizations in each class (N) is indicated to the right of the matrix.
7 approved by the Animal Care Committee of Tel Aviv University (Number L-13-016), and were performed in accordance with its regulations and guidelines regarding the care and use of animals for experimental procedures. e use of bats was approved by the Israeli National Park Authority.Bat housing and monitoring. e bats were housed in 2 identical cages (acoustic chambers; for illustrations refer to ref. 36), with 6 females, 5 pups, and 1 male in cage 1, and 4 females, 4 pups, 1 male, and 1 young (of unknown sex) in cage 2. e cages were continuously monitored for 75 days, with IR-sensitive cameras and omnidirectional electret ultrasound microphones (Aviso-Bioacoustics Knowles FG-O; 2 microphones in each cage). Audio was sampled using Aviso-Bioacoustics UltraSoundGate 1216HA/D converter with a sampling rate of 250 kHz. e chambers were acoustically isolated and their walls were covered with foam to diminish echoes. Raw audio recordings were automatically segmented and ltered for noises and echolocation clicks, leaving only bat social communication calls (see ref. 36 for details of this process). Video was synchronized to the audio, resulting in a short movie accompanying each audio recording. Videos were then analyzed by trained students, who identied the circumstances of each call (emitter, addressee, context, and behavioral response, see details below). e bats were individually marked using a collar with a reective disc. e observers were cross validated during their training to ensure qualied annotations. An emitter bat was recognized by its mouth movements, and 2 cameras could be used to verify a distinct assignment. If there was any doubt regarding the emitters identity we excluded the vocalization from the analysis. is conservative approach is the main cause for the exclusion of almost 90% of the vocalizations from our analysis. ere was a negligible number of events when two bats vocalized together (or shortly aer each other). ese events could be easily distinguished from the spectrograms. We managed to annotate 19,021 calls with all of the required details for classication. We then only used individuals for which we had enough vocalizations in at least 3 of the tested contexts (at least 15 per context). e analyzed data hence consists of 14,863 calls produced by 7 adult females (F1F7). We classied the emitter of the vocalization among these 7 females (the males produced much less vocalizations, hence were not used in this study). For extending the emitter recognition to a larger number of individuals, we used all bats, including adults which were previously recorded in the same setup, excluding pups and those with less than 400 recorded vocalizations, ending up with 15 individuals (SupplementaryFig.S3). Four aggressive contexts were included in the analysis: (1) Feeding aggression interactions during feeding or in close proximity to the food; Cepstral 1 -6 -4 -2 0246 Cepstral 2 -4 -2 0 2 4 6 Feeding aggr. Mating aggr Perch aggr. Sleep aggr Feeding aggr. Mating aggr. Cepstral 6 -2 -1 012 3 Cepstral 8 -2 -1 0 1 2 A B 50 msFr equenc y (kHz ) Fr equen c y (kHz ) Fr equenc y (k H z) F r equenc y (kHz ) 1 0.5 0 0 0.2 0.4 0 0.2 0.4 0 0.2 0.4 C D E FPerch aggr. Sleep aggr Figure 5. Acoustic features conveying the information. An example of the distribution of two pairs of acoustic features in the analyzed contexts. Colored lines mark the 0.5 isoline of the 2D histograms (with max. normalized to 1). (A) Mel-cepstral coecients 1 and 2. (B) Mel-cepstral coecients 6 and 8. In both cases, the distributions mostly overlap each other. e least overlapping regions are marked with dashed rectangles. (CF) For each such region: the fraction of each context, out of the vocalizations which contains values inside the marked rectangle, is shown in the bar plot. Spectrograms of two syllables from the prevalent context in each region are presented, i.e. (C) mating aggr. (D) feeding aggr. (E) perch aggr. (F) sleep aggr. Red bars below the spectrograms mark the regions which contains values inside the marked rectangle. For all spectrograms: frequency axis is 0 kHz, time scale as indicated in (D). To avoid any possible biases in this example, the displayed distributions are taken only from one recording channel (the one with the largest yield, channel 3, cage 2), all of the spectrograms are from vocalizations produced by the same individual (F7), and in each pair of spectrograms each was taken from a dierent day. In total 64 cepstral coecients were used to describe vocalizations, allowing separation of classes among many more dimensions.
8 (2) Mating aggression produced by females in protest to males mating attempts; (3) Perch aggression emitted when two bats who perched close to each other, confronted one another, displaying aggressive acts, accompanied by rapid movements of the wings, but with no close contact; and (4) Sleep aggression squabbling over locations, or other aggression, in the day-time sleeping cluster. For the emitter and addressee classication we also included vocalizations for which the context was not conclusive (General in SupplementaryTableS1). In this General (unidentied) aggression context the interacting bats are usually 10 cm apart, while in the other contexts they were ca 0 cm apart. e Mating aggression context was not used in the addressee classication task, as these vocalizations were exclusively directed toward the male (hence identifying the addressee in this case could result from solely identifying the context). In the prediction of the addressees of the vocalizations we used all addressees with at least 20 calls addressed to them. e outcome of a vocal interaction was dened as one of two options: (1) Depart the two bats split aer the interaction, where either both went their own way, or one of them le and the other stayed in place, (2) Remain together the two interacting bats stayed in the same position (in close proximity) aer the interaction ended. In controlling for emitter/context inuence on addressee/outcome classications (i.e. vocalizations in specic context by individual emitters) we allowed classes (addressees or outcomes) with at least 10 calls, in order to extend the coverage of dierent cases.Egyptian fruit bat vocalizations consist of multisyllabic sequences, with short periods of silence between the syllables (Fig.1E, SupplementaryFig.S1). Each vocalization was rst automatically segmented (as described in ref. 36), retrieving only the voiced segments (see voiced and unvoiced bars in SupplementaryFig.S1). en, a sliding window of 20 ms (with an overlap of 19 ms between consecutive windows) was used to extract 64 Mel-frequency cepstral coecients (MFCC) from each window. e MFCC assumes a logarithmic pitch scale which is typical for mammals (including bats)46. e mel-scale was originally tuned for human perception. However, as we did not intend to mimic the bats auditory system precisely, but only to test for available information and as there is no equivalent bat scale, we chose to use it. e feature vectors retrieved from all segments were joined to one set of several 64-dimentional feature vectors representing the vocalization (SupplementaryFig.S1). e MFCCs were normalized by subtracting their mean for every recording channel (2 channels in each cage) as is commonly done to reduce (recording) channel biases.e GMM-UBM algorithm was used for classication (following39, which used it for human speaker recognition). In short, given a labeled training set of vocalizations, for each class (e.g. emitter F1, the context Feeding aggr. etc.) all sets of feature vectors from all vocalizations of this class are pooled together into one mega-set. is mega-set is then modeled by a Gaussian mixture model (GMM) of 16 Gaussian components. However, instead of directly tting the GMM onto the data, the GMM parameters are assessed using an adaptive method, based on a universal background model (UBM) (see details of the procedure in SupplementaryFig.S1 and SupplementaryMethods). e UBM is a GMM tted to another set of data, which was not used for training or testing. To this end, we employed the data which was not part of the analysis due to lack of detailed annotations. We sampled a random sample of syllables from all of the vocalizations for which the identity of the pair was known but the role of each individual was not certain (i.e. who addressed whom). is sample constituted the background set of 3900 syllables, and its corresponding UBM was used for all of the classications. A test sample, i.e. a vocalization unseen by the model training algorithm, then received a score for each possible class (e.g., each context). e score was computed as the ratio between the likelihood that the sample was drawn from the specied class (computed using the learned GMM) and the likelihood that the sample was drawn from the UBM. is process results with each sample in the test set having a score for each possible class, and the class with the maximum score is chosen as the prediction for this sample. Due to small sample size in some classes, in a few classication tasks, we adopted a leave-one-out approach for cross-validation (for all tasks): Iteratively, over the entire set, each vocalization at a time is le outside of the training set and then given a prediction by the trained model (which means that every prediction is made on a vocalization unseen by the training algorithm). e success of the classication was measured using the balanced accuracy (BA): First, the confusion matrix is normalized by each class size (i.e., the sum of each row is 1, and the diagonal holds the fraction of correct predictions in each class), then, the BA is the average of the confusion matrix diagonal. To estimate a p-value for each success rate we ran permutation tests, where we permuted the labels of the original set (e.g., we mixed the contexts). 100 permutations were used in each test. To exclude any inuence of context-dependent background noises on the classication success, we veried that our results could not be replicated by classifying non-voiced recordings from the analyzed contexts.Each vocalization is represented in our data as a set of 64-dimensional vectors, where this set is of varying size, depending on the duration of the vocalization. In order to illustrate the interplay between the acoustics of dierent classes we projected each vocalization onto a 2-dimensional plane (Figs2B and 3B,D,E). To this end, we assigned each vocalization a new feature vector containing the scores it received from our algorithm for each class. us, each vocalization was represented by a single C -dimensional vector (where C is the number of classes in the classication task, e.g. 7 for emitter classication). For visualization, we now used the rst 2 dimensions of a linear discriminant analysis applied to this new set of C -dimensional vectors. is process can be viewed as a type of multi-dimensional scaling: from a variable number of dimensions (each vocalization was described by a dierent number of 64-dimensional vectors), through the lens of our models, onto a 2 dimensional plane. Importantly, this process is done on scores received when the vocalizations were in the test set, i.e. unseen by the training algorithm.
9 References1. Bradbury, J. W. & Vehrencamp, S. L. Principles of animal communication (Sinauer Associates, 2011). 2. Fitch, W. T., Neubauer, J. & Herzel, H. Calls out of chaos: the adaptive signicance of nonlinear phenomena in mammalian vocal production. Anim. Behav. 63, 407 (2002). 3. Macedonia, J. M. & Evans, C. S. Essay on Contemporary Issues in Ethology: Variation among Mammalian Alarm Call Systems and the Problem of Meaning in Animal Signals. Ethology 93, 177 (1993). 4. Slocombe, E. & Zuberbhler, Functionally referential communication in a chimpanzee. Curr. Biol. 15, 1779 (2005). 5. Evans, C. & Evans, L. Chicen food calls are functionally referential. Anim. Behav. 58, 307 (1999). 6. Wheeler, B. C. & Fischer, J. Functionally referential signals: a promising paradigm whose time has passed. Evol. Anthropol. 21, 195 (2012). 7. Seyfarth, M. et al. e central importance of information in studies of animal communication. Anim. Behav. 80, 3 (2010). 8. Ballentine, B., Searcy, W. A. & Nowici, S. eliable aggressive signalling in swamp sparrows. Anim. Behav. 75, 693 (2008). 9. Giret, N., Albert, A., Nagle, L., reutzer, M. & Bovet, D. Context-related vocalizations in African grey parrots (Psittacus erithacus). Acta Ethol. 15, 39 (2012). 10. Elie, J. E. & Theunissen, F. E. The vocal repertoire of the domesticated zebra finch: a data-driven approach to decipher the information-bearing acoustic features of communication signals. Anim. Cogn. 19, 285 (2015). 11. Henderson, E. E., Hildebrand, J. A., Smith, M. H. & Falcone, E. A. e behavioral context of common dolphin (Delphinus sp.) vocalizations. Mar. Mammal Sci. 28, 439 (2012). 12. Filatova, O. A., Guzeev, M. A., Fedutin, I. D., Burdin, A. M. & Hoyt, E. Dependence of iller whale (Orcinus orca) acoustic signals on the type of activity and social context. Biol. Bull. 40, 790 (2013). 13. Molnr, C. et al. Classication of dog bars: a machine learning approach. Anim. Cogn. 11, 389 (2008). 14. Bohn, M., Schmidt-French, B., Ma, S. T. & Polla, G. D. Syllable acoustics, temporal patterns, and call composition vary with behavioral context in Mexican free-tailed bats. J. Acoust. Soc. Am. 124 (2008). 15. Gadziola, M. A., Grimsley, J. M. S., Faure, P. A. & Wenstrup, J. J. Social Vocalizations of Big Brown Bats Vary with Behavioral Context. PLoS One 7, e44550 (2012). 16. Crocford, C. & Boesch, C. Context-specic calls in wild chimpanzees, Pan troglodytes verus: analysis of bars. Anim. Behav. 66, 115 (2003). 17. Manser, M. B. e acoustic structure of suricates alarm calls varies with predator type and the level of response urgency. Proc. Soc. London B Biol. Sci. 268, 2315 (2001). 18. Price, T. et al. Vervets revisited: A quantitative analysis of alarm call structure and context specicity. Sci. ep. 5, 13220 (2015). 19. Templeton, C. N., Greene, E. & Davis, Allometry of Alarm Calls: Blac-Capped Chicadees Encode Information About Predator Size. Science 308, 1934 (2005). 20. Bohn, M., Wilinson, G. S. & Moss, C. F. Discrimination of infant isolation calls by female greater spear-nosed bats, Phyllostomus hastatus. Anim. Behav. 73, 423 (2007). 21. Clemins, P. J., Johnson, M. T., Leong, M. & Savage, A. Automatic classication and speaer identication of African elephant (Loxodonta africana) vocalizations. J. Acoust. Soc. Am. 117 (2005). 22. Gillam, E. H. & Chaverri, G. Strong individual signatures and weaer group signatures in contact calls of Spixs disc-winged bat, yroptera tricolor. Anim. Behav. 83, 269 (2012). 23. Ji, A., Johnson, M. T., Walsh, E. J., McGee, J. & Armstrong, D. L. Discrimination of individual tigers (Panthera tigris) from long distance roars. J. Acoust. Soc. Am. 133 (2013). 24. Jani, V. & Sayigh, L. Communication in bottlenose dolphins: 50 years of signature whistle research. J. Comp. Physiol. A 199, 479 (2013). 25. ershenbaum, A., Sayigh, L. S. & Jani, V. M. e Encoding of Individual Identity in Dolphin Signature Whistles: How Much Information Is Needed? PLoS One 8, e77671 (2013). 26. Gamba, M., Colombo, C. & Giacoma, C. Acoustic cues to caller identity in lemurs: a case study. J. Ethol. 30, 191 (2012). 27. nrnschild, M., Feifel, M. & alo, E. V. Motherospring recognition in the bat Carollia perspicillata. Anim. Behav. 86, 941 (2013). 28. Mouterde, S. C., eunissen, F. E., Elie, J. E., Vignal, C. & Mathevon, N. Acoustic Communication and Sound Degradation: How Do the Individual Signatures of Male and Female Zebra Finch Calls Transmit over Distance? PLoS One 9, e102842 (2014). 29. Notman, H. & endall, D. Contextual variation in chimpanzee pant hoots and its implications for referential communication. Anim. Behav. 70, 177 (2005). 30. astein, H., Winter, ., Vinoth umar, A. ., andula, S. & Schmidt, S. Perception of individuality in bat vocal communication: discrimination between, or recognition of, interaction partners? Anim. Cogn. 16, 945 (2013). 31. Ancillotto, L. & usso, D. Individual vs. non-individual acoustic signalling in African woodland dormice (Graphiurus murinus). Mamm. Biol. Zeitschri fr Sugetierd. 81, 410 (2016). 32. Slocombe, E. & Zuberbhler, Food-associated calls in chimpanzees: responses to food types or food preferences? Anim. Behav. 72, 989 (2006). 33. Slocombe, E. & Zuberbhler, Agonistic Screams in Wild Chimpanzees (Pan troglodytes schweinfurthii) Vary as a Function of Social ole. J. Comp. Psychol. 119, 67 (2005). 34. Wilinson, G. S. & Boughman, J. W. Social calls coordinate foraging in greater spear-nosed bats. Anim. Behav. 55, 337 (1998). 35. wiecinsi, G. G. & Griths, T. A. ousettus egyptiacus. Mamm. Species 1 (1999). 36. Prat, Y., Taub, M. & Yovel, Y. Vocal learning in a social mammal: Demonstrated by isolation and playbac experiments in bats. Sci. Adv 1 (2015). 37. Salmi, ., Hammerschmidt, & Doran-Sheehy, D. M. Western Gorilla Vocal epertoire and Contextual Use of Vocalizations. Ethology 119, 831 (2013). 38. eenan, S., Lemasson, A. & Zuberbhler, Graded or discrete? A quantitative analysis of Campbells money alarm calls. Anim. Behav. 85, 109 (2013). 39. eynolds, D. A., Quatieri, T. F. & Dunn, B. Speaer Verication Using Adapted Gaussian Mixture Models. Digit. Signal Process. 10, 19 (2000). 40. Miele, A. & Zuberbhler, A method for automated individual, species and call type recognition in free-ranging animals. Anim. Behav. 86, 475 (2013). 41. endall, D., Owren, M. J. & yan, M. J. What do animal signals mean? Anim. Behav. 78, 233 (2009). 42. Schmidt, S. Beyond echolocation: emotional acoustic communication in bats. Evol. Emot. Commun. from sounds Nonhum. Mamm. to speech Music man 92 (2013). 43. Shannon, V., Zeng, F.-G., amath, V., Wygonsi, J. & Eelid, M. Speech ecognition with Primarily Temporal Cues. Science 270, 303 (1995). 44. ing, S. L. & Jani, V. M. Bottlenose dolphins can use learned vocal labels to address each other. Proc. Natl. Acad. Sci. 110, 13216 (2013). 45. Slocombe, E. & Zuberbhler, Chimpanzees modify recruitment screams as a function of audience composition. Proc. Natl. Acad. Sci. 104, 17228 (2007). 46. Grinnell, A. D. In Hear. by Bats (Popper, A. N. & Fay, .) 1, doi: 10.1007/978-1-4612-2556-0_1 (Springer New Yor, 1995).
10 AcknowledgementsWe would like to thank Vlad Demartsev and Irit Opher for commenting on the manuscript and for insightful discussions. We warmly thank Ester Pratt for her excellent thorough work in annotating the videos. Y.P. was supported by e Colton Foundation. We also thank the Ministry of Science, Technology and Space (Grant 3) for supporting Y.P. is work was supported by the Israeli National Foundation, grant number 559/13.Author ContributionsY.P., M.T., and Y.Y. conceived and designed the experiment. Y.P. designed and constructed the setup. Y.P. and M.T. conducted the experiments. M.T. analyzed the videos. Y.P. created the processing and analysis tools and performed the analysis. Y.Y. supervised the study. Y.P. and Y.Y. wrote the manuscript and M.T. reviewed it.Additional InformationSupplementary information accompanies this paper at http://www.nature.com/srep Competing nancial interests: e authors declare no competing nancial interests. How to cite this article: Prat, Y. et al. Everyday bat vocalizations contain information about emitter, addressee, context, and behavior. Sci. Rep. 6, 39419; doi: 10.1038/srep39419 (2016). Publisher's note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional aliations. is work is licensed under a Creative Commons Attribution 4.0 International License. e images or other third party material in this article are included in the articles Creative Commons license, unless indicated otherwise in the credit line; if the material is not included under the Creative Commons license, users will need to obtain permission from the license holder to reproduce the material. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ e Author(s) 2016