How do I select representative stations from 112 stations to do statistical analysis of the temp, rainfall and humidity?

Published on by

I am modelling a 75000 sq km watershed with no in situ measurements for hydro-meteorological data so i depending on data sets that i downloaded from Climate Forecasting System Reanalysiswebsite. After clipping the data to my study area, i have 112 station but it is not possible for me to analyse station by station data due to time constraint. The resolution of the data is 38km.SWAT is catchment model used for rainfall runoff modelling and climate change impact studies. It isalso be used for land use change impact studies in watersheds.

6 Answers

  1. Cluster analysis will solve the issue.

    Answered on by
  2. Hi Elias, First of all, telling you that I am not an expert in this matter, so I maybe my comments are not useful at all for you at this point of your research, but have you tried with some kind of GIS related software, like BASINS? Maybe you can try to download the data from the stations and group them… or you can also try with HSPF to complement SWAT…I cannot understand very well what you are referring to when you mention that “you don´t have in situ measurements” if you have the info from the 112 stations… how long in your time scale for the simulation and analyses? This will be a very important point for the data load, as well as for the data restriction and data management… I would also try to cluster them by similarity of the sampling area and the characterization of the location… In any case, good luck!!!

  3. Just to clarify my assumptions here: where you say "no in situ measurements for hydro-meteorological data" I assume that means "no precipitation measurements" because then you say "I have 112 stations" which I assume are stream discharge data. To narrow down the pool of evaluations, you could: 1. select 52 stations at random from the population (somewhat akin to a jackknife procedure, 52 being approximately the number of CFS Reanalysis grid points in your watershed based on stated grid resolution) 2. do a cross-correlation analysis of the discharge time series for all 112 stations (a simple step, even in MS Excel), rank the correlations (keeping the information on which stations produced each correlation value) from least to most, and then select those station pairs with the lowest correlations until you have a manageable collection of stations (however many you have time to analyze) for your evaluation step. The idea here is to maximize your accounting for the overall variability of stream discharge observations across the watershed. Two stations that are in series on the same stream will likely be highly correlated, so you don't necessarily need both of those stations, as you'd be repeating information in your analysis. Two stations on different streams will be less correlated, so having both will be useful to gauge how well you are representing the spatial variability of precipitation-runoff processes in the watershed. Two stations on opposite sides of the watershed will (likely) be quite different and have a low correlation. Throwing all of the stations into that mix will mean that you get station relationships like that, but also some internal to the watershed, in different sub-watersheds, some at headwaters and some at outlets, etc. with the likely outcome that you end up with a pool of station locations that are spread out all over the watershed, representing both the modeled area and its internal variability.

  4. Elias. I have a few documents that speak to this that might assist. I also would check out some of these experts. Todd Gardner at World Research Institute and Rowan Schmidt at Earth Economics. Both of these men have done presentations regarding your questions. I don't see a method to upload documents. Send me an email. Cervantesbrenda60atgmaildotcom.