invacost_FAQ

InvaCost - Frequently Asked Questions

2021-10-29

This page lists the most common questions asked on the InvaCost database, and the answers we provided for them. If you have one question which is not addressed here or in the database descriptor paper, please contact us by email or open an issue in this GitHub page.

General questions on InvaCost

The database and all associated files (including complete definition and details of each descriptor, the template you should use to provide correction or integration of cost data, and the summary of all changes made between the successive versions of the database) are freely available and downloadable at https://doi.org/10.6084/m9.figshare.12668570.

Which cost estimates should I use for my analyses?

There are several fields that contain economic values (they generally contain the word cost in their names), but you probably don’t need most of them. Indeed, we stored original cost information from the papers (Raw_cost_estimate_local_currency column) and standardised them in 2017 US USD using two different methods based on either exchange rate (ER) or purchasing power parity (PPP). However, this raw information is often difficult to compare because of different periods of time during which those costs have occurred (e.g. 1 year vs 10 years). Therefore, we standardised all cost estimates on an annual basis. We recommend you use one of these two standardised cost columns:

Should I pay attention to specific cost information in order to build my dataset prior to analyses?

InvaCost is the most up-to-date and comprehensive compilation of both published and grey literature on the economic costs of biological invasions. Therefore, it includes sound estimations of realized economic costs of invasive species, with detailed sources and reproducible methodology; but it also includes non-observed, unsourced and/or irreproducible estimates. Therefore, we recommend focusing on those cost data that are observed (Implementation column) and of high reliability (Method_reliability column). Nonetheless, depending on your objectives you may want to consider potential costs that are of high reliability. Overall, you should first define your objectives and then filter out the cost data from the database.

Do I have a risk to encounter duplicates and/or overlaps for some cost entries?

While we do our best to limit/exclude exact cost duplicates, InvaCost may have cost entries that could be actual or potential duplicates, or overlaps that may be (i) temporal: recent cost entries that incorporate older ones; (ii) geographic: cost entries that cover a large spatial scale, and that may include some costs provided at smaller scales; (iii) taxonomic: cost entries covering several species that may include several costs also provided separately for single species (e.g., ‘Rodentia’ order with ‘diverse/unspecified’ species, while there are other costs for Rattus spp.and *Mus* spp. in the same area); (iv) *methodological: same cost entries provided as different ones by (at least two) different studies. We have a descriptor (Overlap column) indicating these overlapping costs. Nonetheless, any user should be aware of – and then acknowledge – this potential limitation, and thus perform specific data processing prior any analyses. Indeed, following the objectives of a particular study, there may be different options to deal with these duplicates, including for instance 1/ working at specific spatial scales below (and including) the country level to avoid double counting in some continental or regional costs (while this strategy could not be suitable for species with few costs provided for those spatial scales that would be removed), or 2/ identifying for each species (through a table) cost data that are recorded in the same geographic region/country with the same cost values.

How can we correct and/or expand cost information in the database, and then use these novel data for our analyses?

We are willing to receive any feedback that could improve our database. Any reader or user can therefore add new information and/or correct existing ones within InvaCost using our dedicated template (see Question #1). The contributors are expected to send a message (along with the template filled with corrected and/or new cost entries) to our e-mail address: updates@invacost.fr. Contributors should provide access to the document from which information was taken. Similarly, we are willing to receive any idea or suggestion to improve the structure of InvaCost through the same email address. As you may see, the structure of the database has substantially evolved since its first release. When you use your corrected dataset, you should simply state clearly and transparently in your article/report that (and in which way) you altered the original dataset for your purposes. Also, we recommend that the final dataset is systematically linked to the article/report (e.g. as a supplementary material, where all new or corrected data are highlighted).

How do you proceed when additional cost(s) or data correction are sent to updates@invacost.fr?

Any cost information received needs to be double-checked, standardized and validated by our team before being integrated in the official database. This process may take some time, and we add new cost information in the public (online) database only in the frame of official updates – in the meantime, we compile and progressively integrate (offline) all these new cost data.

Descriptor “Cost_ID”

Could you explain me how I should assign an ID to a new cost entry I propose to be integrated in the InvaCost database?

Actually, we manage this descriptor internally, and assign new cost IDs when the database is publically updated.

Descriptors “Availability/Previous_materials”

Question on source literature

Question: In some cases, I am not sure if the paper is a mismatch or a proper paper for the database. For example, in one paper they mention only in passing, for example in the Introduction or Discussion, some IAS cost, but they are citing another publication which I cannot access, e.g.: “Removal of water hyacinth (Eichhornia crassipes) in Wenzhou city cost 128 million US USD in 1996 (Ding and Xie 1996). An estimate for the economic loss caused by Mikania micrantha on Neilingding Island ranges from 0.56 to 1.26 million US USD (Zhong et al. 2004).” These cited publications seem to be in Chinese, so I cannot access the original material, and the reported information in the accessed is very scarce, so I would not be able to get all the information we need (e.g. time scale, spatial coverage, etc. - for the paper in question I think there will be no data for 90% of types of information we need, simply because they just briefly cited other papers in two short sentences). What should I do in such cases, include this paper and the brief information they report in those sentences, or exclude it from the dataset?

Answer: We opted for the following strategy. First, we include the papers Ding and Xie 1996 and Zhong et al. 2004 * as main sources (for which we give the dedicated title, publication year, authors, etc.), then indicated no in the Availability column; we mention the reference from which the unreachable papers came in the Previous_materials column. Second, the associated information (geographical situation, time range, etc.) are captured from the accessible source citing the original reference from where the cost comes. We encourage you to add any necessary information in the Details column; for instance, you can mention this estimate comes from an inaccessible reference).

Question on literature availability

Question: I am not completely sure about the Availability column, to which publication it is referred (e.g. see the issue I described in the previous point). If a paper A cites paper B regarding some cost, and paper A is available through online access but paper B is not, I was still noting that as “Freely accessible”, that the paper I describe in the entry is freely available (even though the cited one is not). This is a bit confusing I guess, since in such situations we have two papers, citing and the cited one, so it is question which one are we defining here?

Answer: It refers to the cited one. If paper B is not accessible, we mentioned Reference title, Authors and Publication year associated with paper B, then put No in the Availability column (and the full citation of paper A in the Previous_materials descriptor). We also indicate the full sentence (from paper A) in which paper B was mentioned in the Details column.

Descriptors “Environment/ Environment_IAS”

Question on how the impacted environment is reported in the database

Question: If an article proclaims a loss in property values per house on a lake side, but does not tell us how many houses there actually are nor tells us if this is for one year or a loss over a certain period of time, where is the affected environment, terrestrial as it is a house AT the lake, or is it aquatic as the invader is an aquatic plant that causes the cost?

Answer: You can add both information, as we have specific columns for (i) the environment impacted by the cost (Environment descriptor) and (ii) the environment of the concerned invasive alien species (Environment_IAS descriptor).

Descriptor “Official_country”

How to manage entries with multiple countries?

Question: What should we consider information like “Colombia/Ecuador/Peru” in the ‘Official_country’ column when analysing cost data? Could we divide or replicate the cost for each of the three countries separately?

Answer: This is not realistic to replicate the same cost data for each country when such cost was provided at a regional (and not a country) scale. One could suggest dividing the total cost by the number of countries, and then replicate the resulting cost estimate for each of the country. However, it would be highly speculative given we have no precise idea on how the cost is specifically distributed (probably not equally) between these countries.

Descriptor “Spatial_scale”

Question on how spatial scales are entered in the database

Question: If an article proclaims a loss in property values per house on a lake side, but does not tell us how many houses there actually are, how should I deal with it?

Answer: You must record the information as it is provided, i.e. take the cost as given by the authors (no necessary to multiply by the number of houses), and precise unit (house) in the Spatial scale column.

Question on how to categorize costs estimated at different levels

Question: How do we treat costs that are given at a per individual level and/or are estimated as a total savings due to some effective control measure – e.g. “Reduced respiratory ailments costing approximately USD23.85 per sufferer induced by the weed per annum. With a reduction of approximately 350,000 people suffering from hay-fever as a result of biocontrol in 2005, this equates to an estimated health benefit of approximately USD8.4M in 2004/05 dollar terms” and another example “Biocontrol is estimated to reduce production losses in Victoria by approximately USD4.4M per annum through the following channels…”?

Answer: For the first case (i.e. costs that are given per individual level), we opted to create a dedicated category in the Spatial scale column, that is unit’ followed by the indication of the unit in which the cost was expressed (in brackets). In your case, it would be *unit (sufferer). For the second case (i.e. costs that are estimated as a total savings due to some effective control measure), we considered them as such by assigning them to avoided costs* in the Type of cost column and *potential in the Implementation column. Indeed, these costs have not been suffered by people, but should have been if there was not effective control.

Descriptor “Period_of_estimation”

Question on how dealing with missing temporal information

Question: If an article proclaims an economic loss, but does not tell us if this is for one year or a loss over a certain period of time, how should I deal with it?*

Answer: You should indicate unspecified in the period of estimation column, and also in the Time range, starting year and ending year columns (if these information are also not provided).

Descriptor “Time_range”

Question on unclear cost duration

Question : I have costs that are X US USD in total inferred from the last 10 years prior to the publishing year of that study, I can enter the ‘probable starting year’ and ‘probable ending year’, the ‘raw cost estimate’, but I have some doubt for the ‘Time_range’ descriptor. Could you provide me with some guidance?*

Answer: You should consider year if the cost is provided as an annual value expected to occur each year over the last 10 years. Conversely, you should enter period if the cost is provided as a total sum over the 10 years.

Descriptors “Probable_starting_year / Probable_ending_year”

What is the difference between these descriptors and those that have ‘adjusted’ as an additional term?

The probable_starting_year & probable_ending_year columns (named original columns hereafter) contain the information extracted from the processed sources, with the possibility to have no clear information retrieved. This explains why we have unspecified in some cases. The probable_starting_year_adjusted & probable_ending_year_adjusted columns are those we created for analyses purposes. These columns have the same info than the original columns, except in cases where we have unspecified as provided information. We then filled (only when possible) these cells with an actual year based on educated guesses (e.g. corresponding to publication year or calculated from the duration time, when provided).

Descriptors “Probable_starting_year_adjusted / Probable_ending_year_adjusted”

Question on missing information

Question: Why do some cost entries have no information in these columns?

Answer: When we were not able to estimate them based on the ones provided by the authors (e.g. Period_of_estimation column), we leave them blank, and then we don’t calculate the cost estimates per year. This means that the cost entries that have no information for at least one of these columns are not included in most of the cost analyses implemented in the package.

Question on further use for data analysis

Question: Could I fill these cells when there is no information, so that the ‘invacost’ package will consider these data for temporal analyses?

Answer: As for any missing information in the database, you can fill the blank cells based on your own assumptions. In that case, you should upload the altered dataset and not use the original one if you want to use the ‘invacost’ package – as the latter only includes the original updated versions of the database.

Descriptor “Occurrence”

Question on incomplete information from the reporting document

Question: I have costs that are X US USD in total inferred from the last 10 years prior to the publishing year of that study, I can enter the ‘probable starting year’ and ‘probable ending year’, the ‘raw cost estimate’, but I have some doubt for the ‘Time_range’ descriptor. Could you provide me with some guidance?

Answer: If the cost is expected to not still occurring after these 10 years, you can indicate one time in the Occurrence column. Otherwise, you should put potentially ongoing, if there is some reason to suspect that this cost still continued after its report in the source document.

Descriptor “Raw_cost_estimate_local_currency”

Question on how collating specific information

Question: How do I report cost estimates in cases when there were both pooled data and data per species provided?

Answer: You should report data per species, but in cases when all data were reported per same species or group or species, but they provided both pooled and very detailed data for different cost categories (e.g. price per gas, manpower, overheads, etc.), you should opt for pooled data. In some cases, they reported different values for costs for different years, even though all the other descriptors (species, locality) were the same. In that case, you should report each of the values separately for each year.

Question on multiple cost data / scenarios

Question: One paper I am adding has avoided costs for two scenarios (depending on climate change and management effort): “Under moderate climate change (+1.5 8C by 2050) discounted net benefits (4% discount rate) for the whole period range from 19 to 582 million euros depending on the effectiveness of the policy impact. With stronger climate change (+2.4 8C by 2050) the total discounted savings over the full period are expected to range from 42 to 1063 million euros assuming a highly effective policy.” Which values should I add? (1) pick one scenario (2) add both scenarios as a separate entry (3) average the costs somehow?*

Answer: When there are several cost figures from the different scenarios, we suggest taking the smallest cost (here, 19 million €) as the minimum cost estimate and the higher cost (here, 1063 million €) as the maximum cost estimate. From there, you can use the median cost (of all the known values) as the Raw cost estimate.

Descriptor “Raw_cost_estimate_2017 (USD_exchange_rate or USD_PPP)”

How should we convert the collected costs to 2017 US USD?

We ask collectors to record only the raw cost estimate as provided in the source document. The cost standardisation to 2017 US USD is carried out by us. We are working on implementing this step directly into the ‘invacost’ package.

Question on which raw cost estimate should be collated

Question: I have a list of direct costs and a single “total cost” for each species, where the total cost includes both direct and indirect costs. I think about adding all of them as separate lines in the database (with direct costs separated by sector where possible), but technically that’s repeating some of the costs - what would you suggest?*

Answer: As you do not have a comprehensive list for the indirect costs (as is the case for direct costs here), I suggest to collate only total costs per species – so that you will not miss the estimates associated with indirect costs (it would be the case if you opt for adding direct costs only. If you feel some costs may be repeated, you can specify this information in the Overlap column. Also, you can provide a full explanation of your choice in the ‘Details’ descriptor, so that readers and users will be aware that complementary details can be found in the source document.

Descriptor “Cost_estimate_per_year_2017 (USD_exchange_rate or USD_PPP)”

Why do some cells not contain cost values?

There are cost entries for which we have inadequate time period information: some studies omitted to provide time periods, which can cause tremendous biases when analyzing the temporal trends of costs. Sometimes, this is not problematic as we can safely guess when costs occurred. In other cases, the exact or approximate duration of costs that spread over several years/decades is hardly guessable; therefore, it was impossible to safely estimate cost estimates per years for the latter cost entries.

Descriptor “Type_of_applicable_year”

If there is no way to infer the type of applicable year, what do I put in that column?

Actually, the applicable year is the year used for standardizing costs to US USD 2017. You must fill the cell with either effective if the authors specified the info in their paper, or publication year otherwise. Therefore, you should use the publication year of the source in your case.

Descriptor “Implementation”

Question on to distinguish between potential vs observed costs

Question: I sometimes struggle to distinguish between “observed” versus “potential” costs. For example, I read the following “We estimate hogweed prevalence at circa 0.1 hectare per district. Given 323 rural districts, yields an expectation of 0.323 km2 inhabited by hogweed, whose removal would cost over € 5.6 million. However, these control efforts are not being undertaken, it must be further assumed that less than 10 % of this infested area is being dealt with (author’s estimate).” Then later in the text these € 5.6 million listed as “annual costs incurred by giant hogweed infestation in Germany.” The authors classify it as an incurred –> observed cost, despite previously mentioning that only 10% of this estimated amount is actually being spent. I have classified it as “potential” although I still have doubts because the estimate is based on the species’ current distribution. What do you say?*

Answer: We agree with the choice you did, i.e. classify this cost as potential because the control efforts were not undertaken. Note that we are currently refining this descriptor in order to consider simultaneously the distribution of the invasive species, the nature (realized or not) of the cost as well as the temporality (past/current or predicted) of the quantified impact in the study area.

Descriptor “Acquisition_method”

Question on how to identify “extrapolation” costs

Question: You mention that “extrapolation” relies on computational modeling. Is it always the case? For example, if it’s a simple area-based upscaling without any modeling involved, would it be estimation or extrapolation?*

Answer: It is likely that a large part of reported/estimated costs comes from spatial up-scaling (for instance, costs reported at a national scale are probably not systematically the result of estimations from all the regions of the country, but often derive from estimations made for a one or some areas within this country). However, in most cases the authors do not explicitly state it. Therefore, we have decided to indicate extrapolation only when it is clearly indicated in the paper (for instance, via formula or calculations over a long time period or area).

Descriptor “Impacted_sector”

Question on how to identify the relevant impacted sector

Question: When the costs are for eradication, what is the appropriate “impacted sector”? Is it the sector where most of the costs of the invader would be if it weren’t eradicated, or should it be authority/stakeholder costs because they are the ones paying for the eradication?*

Answer: In cases where we don’t have exact information on the sector(s) impacted by the cost, we put Diverse/unspecified when the costs are explicitly related to damage or loss. When this is about management – like in your case – we enter authority/stakeholders.

Descriptor “Type_of_cost”

Question on non-monetary estimation method

Question: The paper on economic impact of Acacia in Israel actually reports the annual mean willingness to pay (WTP) for containment or eradication of A. saligna was US USD8.41 and US USD8.83, respectively. This is a non-monetary method. So I guess we should not include it in Invacost?*

Answer: Actually, we report all economic estimates irrespective of the method used. Especially, the WTP is a method that was used for several cost estimates already recorded in our database. You can therefore enter this containment/eradication cost and indicate that the method used is the mean willingness to pay in the Details column.

Question on avoided costs

Question: In some papers, they report that control/eradication investment contributed to certain economic value that was saved (e.g. ecosystem services, agriculture production, etc.) - something that would be destroyed by IAS if such control effort was not implemented (e.g. “It has been estimated that if sea lamprey was not controlled, the loss of fishing opportunities and indirect economic impacts could be greater than USD500 million annually”). I was also including such costs as “Extrapolation”/“Potential” costs - on one hand, that is something that did not happen and won’t it happen, because there was control effort and such loss was prevented, but on the other hand that is a nice indication of the potential of IAS to cause economic cost, so it seemed appropriate to enter such economic benefits saved/prevented through control as estimates of potential damage some species can produce. What do you think?*

Answer: We agree with you. In addition, these potential cost estimates should be considered as avoided costs for the Type_of_cost descriptor.

Question on cost-benefit estimates

Question: I have two papers which report avoided damage costs using cost-benefit analyses. I interpret these as benefit values, and if I understand correctly we do not include those in Invacost? I have included excerpts from the paper here below. What is your advice?*

Answer: These costs should not be considered as actual benefits because they are not the results of the direct exploitation of the invasive species. We decided to consider them as avoided costs (for the Type of cost), and mark them as potential costs (for the Implementation descriptor). Indeed, they represent money would have been lost if nothing was done.

Descriptor “Method_reliability”

Should I consider only cost data considered as of “high” reliability?

You should remember that this column does not reflect a systematic examination of the methodology for peer-reviewed articles and official reports (we did it only for grey materials that were likely not assessed by pairs or experts), but rather reflects an objective categorisation based on transparent and consistent criteria. This strategy was used because assessing the reliability of very diverse methods used on various taxonomic groups, habitats and economic sectors of hundreds of studies simply makes any objective dichotomy between high and low reliability highly challenging. Therefore, this descriptor should be considered as a first approximation of cost reliability, and you should decide whether or not you want to eliminate papers on this basis. To do that, take time to investigate the Details descriptor (especially for cost values that you deem suspiciously high) and potentially go back to the original source document to make your decision. For an example on how to do that, take a look at the “Determining cost estimate reproducibility” section in Bradshaw et al. 2016: https://doi.org/10.1038/ncomms12986). You could also rely on the Method_reliability_refined descriptor (see below).

Question on methodology for grey materials

Question: I am not sure if the criteria I was applying for this column is a proper one. If the authors of a grey report did some estimation/calculation/extrapolation themselves, or cited others who did it, but in either case if they did not explain clearly how they came up with the value (in a clear enough manner so others could repeat the whole procedure) I was marking it as “Low”, as being a non-repeatable value. Moreover, if they reported some investment (e.g. cost of eradication in some area) but didn’t provide clear data about it (e.g. who funded it, who and what was exactly funded, etc.), e.g. if they only cited briefly some other paper as a source, I was also marking it as “Low”, as being non-repeatable - I am not sure if this is OK?

Answer: Yes, we agree with your choices as the estimation methodology for grey materials (other than official reports) should be transparent and reproducible to be considered as of high reliability.

Descriptor “Method_reliability_refined”

What is the difference between the ‘Method_reliability’ and the ‘Method_reliability_refined’columns?

The reproducibility of published methods should not rely only on the nature of the materials, but also on the transparency and reproducibility of the methodology per se. The individual examination of the methods employed by the thousands of document sources analysed when building InvaCost came with a series of constrains and limitations, which are generally associated with too large datasets involving several collaborators (see question #33). Indeed, the attribution of reproducibility should be a qualitative procedure specific to each monetary estimate recorded, which requires specific expertise and time. The regular updates of the database as well as the growing scientific community working on the InvaCost project allows now to carry out such a tremendous work. This Method_reliability_refined descriptor relies on an expert-based assessment of reliability – and not pre-defined criteria – and can be used as an alternative or as a complementary approach to the standard **Method_reliability* descriptor.

Why do some cost entries have empty cells for this descriptor?

We fill this column as we get new expert-based assessment of the method reliability of the cost entries. This means that currently empty cells will be filled as the database is updated.

Descriptor “Overlap”

Question on how dealing with multiple costs from a single study

Question: There are some studies which have modelled the cost of managing (or containing) invasive species in a landscape. Like other modelling studies, these studies often use known parameters based on a small spatial scale to extrapolate the estimated cost for a larger spatio-temporal scale. While dealing with these studies, should we extract both information or only the extrapolated one?*

Answer: Yes, we should extract both cost data, which will allow opting for one or the other information depending on the analyses targeted and the research questions. However, you should indicate that both costs are overlapping in the Overlap descriptor.

Descriptor “Benefit_value(s)”

Question on whether and how reporting monetary benefices

Question: My understanding is that this should be noted with “Yes” only if the authors also reported that the given IAS also produces some positive monetary values, and they reported such values in the paper? There is also a case where authors report cost of controlling/eradicating some species, and they also provide some positive values that such control efforts produce (e.g. increased employment), but I think this should not be considered as a benefit value of IAS, as it is not a monetary gain IAS produces, right?*

Answer: Yes, we consider Benefit_values as existing from the analysed sources only when money was gained from direct exploitation of the invasive alien species (IAS) studied. In the second case you explained, we considered avoiding costs for those apparent benefit-like values, i.e. money that is not loss any more due to efficient management strategy targeting the involved IAS. For instance, if a species A (impacting agriculture) was successfully controlled (control costs = 10,000 USD) and that the net benefit is 13,000 USD, we use one line for control costs (= 10,000 USD) and another for avoiding costs (= 13,000 USD). Alternatively, if this species A is used for trade activities that generates 14,000 USD, we simply mentioned yes in the Benefit_values column, but only if the source also contains (at least) one cost estimate – which is the first requirement to have this source recorded in the database.