An Assessment of Snow Cover Duration Variability Among Three Basins of Songhua River in Northeast China Using Binary Decision Tree

The dynamics of snow cover differs greatly from basin to basin in the Songhua River of Northeast China, which is attributable to the differences in the topographic shift as well as changes in the vegetation and climate since the hydrological year (HY) 2003. Daily and flexible multi-day combinations from the HY 2003 to 2014 were produced using Moderate Resolution Imaging Spectroradiometer (MODIS) from Terra and Aqua remote sensing satellites for the snow cover products in the three basins including the Nenjiang River Basin (NJ), Downstream Songhua River Basin (SD) and Upstream Songhua River Basin (SU). Snow cover duration (SCD) was derived from flexible multiday combination each year. The results showed that SCD was significantly associated with elevation, and higher SCD values were found out in the mountainous areas. Further, the average SCDs of NJ, SU and SD basins were 69.43, 98.14 and 88.84 d with an annual growth of 1.36, 2.04 and 2.71 d, respectively. Binary decision tree was used to analyze the nonlinear relationships between SCD and six impact factors, which were successfully applied to simulate the spatial distribution of depth and water equivalent of snow. The impact factors included three topographic factors (elevation, aspect and slope), two climatic factors (precipitation and air temperature) and one vegetation index (Normalized Difference Vegetation Index, NDVI). By treating yearly SCD values as dependent variables and six climatic factors as independent variables, six binary decision trees were built through the combination classification and regression tree (CART) with and without the consideration of climate effect. The results from the model show that elevation, precipitation and air temperature are the three most influential factors, among which air temperature is the most important and ranks first in two of the three studied basins. It is suggested that SCD in the mountainous areas might be more sensitive to climate warming, since precipitation and air temperature are the major factors controlling the persistence of snow cover in the mountainous areas.


Introduction
Seasonal snow cover is an important global component of the surface heat budget, hydrological cycle and climate systems (Balk and Elder, 2000;Stocker, 2014).Due to the physical properties, snow cover has a pro-found effect on energy exchange and heat budget (Foster et al., 2008).For instance, the high albedo of snow cover increases the surface albedo by 30%-50%, and a low thermal conductivity could prevent the sensible heat transferring from the surface to the atmosphere.Snow cover also modulate the temperature feedback that controls the regional and global climate changes, and snowfall tends to occur at low temperatures that enable snow cover to stay for longer periods in winter.While a higher temperature in spring could enhance the snowmelt course.In addition, winter snowfall holds a large amount of water resources, especially in the mountainous areas at mid and high latitude.Spring snowmelt provides a major water source for human use and agricultural irrigation.Thus, sow cover interacts with hydrological, biological, chemical, and geological process along with changes in water cycling and energy balance.
Since TIROS-1 (Television Infrared Observation Satellite) was first used for monitoring snow cover in Canada in 1964 (Lucas et al., 1990), snow cover has been mapped from various optical sensors, i.e., Landsat (Dozier, 1980;Rosethal and Dozier, 1996), AVHRR (Advanced Very High Resolution Radiometer) (Brest et al., 1992), MODIS (Moderate Resolution Imaging Spectroradiometer) (Hall et al., 1995;2001;2002), SPOT (Systeme Probatoire d'Observation de la Terre) (Dankers et al., 2010) and microwave sensors, i.e., SMMR (Scanning Multichannel Microwave Radiometer), SSM/I (Special Sensor Microwave/Imager) (Pulliainen and Hallikainen, 2001;Chang et al., 2016), AMSR-E (Advanced Microwave Scanning Radiometer) (Chang et al., 2000;Derksen, 2008).Due to its fine temporal resolution, MODIS data have become a major source of optical data for the monitoring of snow cover, which supply daily, 8-day and monthly snow cover 'binary' and fractional products (Hall et al., 2001;2002).The spectra feature of cloud resembles snow in the visible band, which makes it difficult to distinguish the snow from cloud (Fotster et al., 2008).Thus, the high cloud cover that obscures ground observation limits the application of MODIS snow cover products.Current research has developed a set of cloud reduction strategies, including a daily combination of MODIS Terra and Aqua, temporal filters, spatial filters, snow line and snow cycle, and the combinations of MODIS with microwave data (Liang et al., 2008;Parajka and Blöschl, 2008;Wang et al., 2009;Gao et al., 2010;Parajka et al., 2010;Paudel et al., 2011;Zhang et al., 2012;Chen et al., 2014).Daily combinations of Terra and Aqua are acknowledged as the first step for cloud reduction, but the subsequent steps vary in different research studies.In Northeast China, the overall accuracy of the snow products of MOD10A2, MOD10C2 and AMSR-E has been assessed to be 69.3%, 76.6% and 76.3% respectively (Lei et al., 2011;Zhong et al., 2010).Based on the work of Gao et al. (2010) and Wang et al. (2009), daily combination and flexible multiday combination observations were produced from daily Terra and Aqua data for further work (Chen et al., 2014).
The persistence of snow cover is influenced by the climate (Lettenmaier, 2005;Sacks et al., 2007;Stocker, 2014), vegetation (Davis et al., 1997;Georg et al. 2007) and topography (Daly et al., 1994;Sacks et al., 2007;Tong et al., 2009;Litaor et al., 2015).Statistical models, which make use of the point-based measurements to analyze effects of multiple factors on snow cover persistence ignore the physical processes of snow course and are limited by the sample size (Daly et al., 1994;Molotch et al., 2005.)Although snow parameters obtained from remote sensing images could provide additional data, the statistical models only work effectively in small areas (~3200 km 2 ) and fail to combine the influence of multiple factors (Prokop et al., 2008).Binary decision tree was initially used to produce snow depth and snow water equivalent (SWE) products for small watershed using point-based measurements (Elder et al., 1995).The previous research is also compared with geostatistical techniques (such as co-kriging) (Balk and Elder, 2000;Winstral et al., 2002;Elder et al., 1998).The model was then applied to analyze the sensitivity between snow cover changes and variabilities in climate, vegetation and topography (Molotch and Meromy, 2014).The commonly used independent variables are net solar radiation, slope, elevation, aspect, vegetation type, air temperature and wind redistribution (Molotch et al., 2005).The interaction between snow cover dynamics and topography, vegetation cover and climate changes are meaningful and need further careful studies for thorough understanding.
Northeast China is one of the three largest seasonal snowfall areas in China where the average snowfall ranges from 30 to 150 mm per year in winter (Zhen et al., 1993).Current studies mainly focuse on exploring the algorithm and pattern of snow based on climate ob-servations and multi-source satellite data (Lei et al., 2011;Sun et al., 2010).Previously, the relationship between snow cover and impact factors were analyzed over the whole area, where the inter comparison among watershed was barely considered (Li et al., 2014;Song et al., 2009).On this context, the objectives of this study are to: 1) examine and compare the snow dynamics of three basins in Northeast China from hydrological year (HY) 2003 to 2014; 2) establish binary decision trees to describe the nonlinear relationship between snow cover duration and the controlling factors; 3) rank the importance of different controlling factors.The work herein produced flexible multi-day snow cover products and SCDs for each hydrological year using MODIS Terra and Aqua daily snow cover products.After exploring the snow patterns, binary decision tree was applied to analyze the roles of NDVI, elevation, slope, aspect, precipitation, and air temperature on the SCD in the three basins, in understanding the major controlling factors and their influence on the persistence of snow cover.

Study area
The study area includes three basins of Songhua River of Northeast China (Fig. 1), covering the Nenjiang River, the mainstream Songhua River and the second Songhua River.According to the spatial distribution of the three rivers, the corresponding basins are selected, including the Nenjiang River Basin (NJ), the Downstream Songhua River Basin (SD) and the Upstream Songhua River Basin (SU) (Table 1).SD is located in the middle of Northeast China Plain; SU is distributed in the mountainous areas of Changbai Mount; and NJ is also located in the mountainous areas and the transition zone from Northeast China Plain to Mongolia Plateau.The basin boundaries were derived from Shuttle Radar Topography Mission (SRTM) 3 using ArcGIS hydrological tools, including the derivation of flow direction, filling sinks, calculating flow accumulation, extracting the drainage network and basin boundaries.

MODIS data processing
In this study, MODIS daily products MOD10A1 (Terra) and MYD10A1 (Aqua) were used which covered 12 hydrological years (HY) from 2003 to 2014.A hydrological year (HY) is defined as spanning from Septem-ber 1 to April 30 of the next year.Eight tiles of MODIS snow cover data (h25v04, h25v03, h26v03, h26v04, h27v04, h27v05, h28v04 and h28v05) were required to cover the whole study area and then mosaicked together using the MODIS Reprojection Tool (MRT) (Hall et al., 2001;2002).
The combination process involves two steps: daily combination and flexible multiday combination (Wang et al., 2009;Gao et al, 2010;Chen et al., 2014).First, MOD10A1 (Terra) and MYD10A1 (Aqua) acquired in a single day were merged into one MODIS daily combined snow cover product (MODISDC) in the following priority order: snow cover > lake ice > water > cloud > polar/darkness > missing data > no meanings.Then, MODISDC maps were inputted for the flexible multiday combinations, which are controlled by two thresholds: a maximum cloud percentage P ≤ 10% and a maximum composite day N ≤ 8 d.As soon as either of these two parameters reached the threshold, the processing of data was stopped.Daily combination and flexible multiday combination observations were produced from daily Terra and Aqua data with an overall accuracy of 47.51% and 76.52%, respectively, which were validated by in-situ measurements (Chen et al., 2014).The pixels of snow cover and lake ice from MODISMC products were reclassified as snow (coded with 1) or as non-snow (coded with 0).The snow cover duration (SCD) maps were calculated by overlying all the binary maps of snow cover in each of the hydrological year.(Loveland and Belward, 1997;Loveland, 2000), all pixels covering water bodies, settlement place and snow were removed.Annual maximum NDVI values were calculated for further work.Elevation, slope and aspect were derived from SRTM 3 data with spatial resolution of 90 m.Daily air temperature and precipitation data were downloaded at 96 permanent meteorological stations from China Meteorological Data Sharing Service System (http://cdc.cma.gov.cn/).Mean values of daily air temperature and total precipitation of 12 hydrological yeas were interpolated for the whole Northeast China.Elevation, aspect, slope, air temperature and precipitation were resampled to 500 m, which is consistent with MODIS.Grid-based SCD and six controlling factors were extracted at an interval of 0.1° for the construction of binary decision tree.All the parameters were standardized by method of z-score in Matlab software, and then was applied to build binary decision tree.

Binary decision tree
The binary decision tree has been proven an effective approach in modeling the spatial distribution of snow depth and SWE.The method reveals the nonlinear and hierarchical relationship between SCD and impact factors.The classification and regression tree (CART) algorithm is a built-in Matlab software feature, and used for tree construction.This could predict a response for Y from inputs X 1 , X 2 , …, X P .We considered SCD as Y and the influence factors as X 1 , X 2 , X 3 , X 4 , X 5 , X 6 , which represent the elevation, slope, aspect, NDVI, air temperature and precipitation respectively.The binary tree was constructed by dividing X into two subsets or nodes repetitively.From the top node, a test splitted the binary tree into the left and right sub-branch tree in a form: is X c < M? The final leaf node revealed a prediction on classification, which averaged or aggregated all the training data that generate in the left node (Breiman, 1984;Yisehac and John, 2006).The input dataset X was divided in to 10 equal capacity datasets.Nine parts of origin data were trained for the construction of tree, and the rest one part was used for accuracy assessment.The best combination of independent variables for the three basins were determined through cross-validation based on the error classification rate of candidate binary tree.Minimum error pruning method was conducted on the binary tree candidate.From bottom to top, the errors of each non-leaf node and its sub-branch were calculated and weighed by the training data number of each sub-branch.The variables and the number of terminal nodes which minimized the deviance of the model and maximized the mean square error were selected as the best predictors, which was the minimum-cost tree.After building the best fit binary decision, the importance of independence variables was weighed by in order of occurrence frequency, which generated a priority order rather than ratio-scale value of importance (Elder, 1995).The chosen first variable played dominant roles in splitting nodes, whereas the latter ones played minor roles.

Distribution analysis of snow cover duration
Fig. 2 shows the spatial distribution of SCD in the three basins of Songhua River from HY2003 to HY2014.For all the three basins, HY2013 and HY2010 had longer SCD and wider distribution than other years; meanwhile, HY2008 had shorter SCD and less distribution as compared to other years.The highest SCDs were noted in the mountainous areas with higher elevation.In the NJ, higher SCDs were mainly distributed in the northeast of Nenjiang County (NJC) and Wudalianchi (WDLC), where SCD lasted more than 4 months.In the SD, the high value areas were mostly located where Yichun (YC), Tieli (TL) and Shangzhi (SZ) meet, as well as where the snow cover stayed for 4 to 6 months.The areas with higher values of SCD (>120 d) extended from the northeast NJ to northwest SD, which were clearly observed in HY2007.In the SU, the areas with the high values were mostly located in Changbai Mount (CM), where highest values were noted more than 6 months.SCD decreased from southeast to northwest, corresponding to an elevation shift.In general, the SD and SU have longer SCDs and wider distribution of snow cover than the NJ; the highest SCDs were located in the CM and the southeast SD, whereas the lowest SCDs were located in the southern NJ.Fig. 3 illustrates the spatial distribution of mean SCD and the statistics in the areas at an interval of 30 days.Mean SCDs of the NJ, the SD and the SU were 69.43, 98.14 and 88.84 d with standard deviation of 37.42, 29.59 and 27.89 d, respectively.Compared with other two basins, the NJ had the highest area fraction in 0-30 and 121-150 d of snow cover duration and the lowest area fraction in 151-180 d.In 61-90, 91-120 and 121-150 d, the area fraction of the SD and the SU are close to each other, and the situation in the NJ is different.36.00% of the NJ, 43.92% of the SD and 46.20% of the SU have SCD longer than 3 months.And 18.06% of the NJ, 10.76% of the SD and 10.07% of the SU have SCD longer than 4 months.The SU have highest area fraction of 2.82% with SCD over than 5 months.The fraction of SCD more than 180 days in both NJ and SU are too small to be visible in Fig. 3, in which the area fractions were less than 0.03%.

Construction and selection of binary decision tree
Binary decision tree treated yearly SCD values as dependent variable and six impact factors as independent variables.The two cases were considered herein: Case 1 used all the six parameters, whereas Case 2 only used topographic and vegetation parameters.In total, 6 classification trees were built up in these two cases, and the minimum-cost tree of binary decision trees in the three basins of Songhua River were determined for further analysis.The minimum-cost tree has minimum model deviation and maximum mean square error.Fig. 4 shows root mean square error (RMSE) of six classification trees using CART in the two cases, and a higher RMSE means lower modelling accuracy.From Fig. 4, RMSE values of binary decision tree decreased when considering climate factors, which means the accuracy of binary decision tree increased considering the climate factors.The situation was similar in all the three basins.For example, RMSE of NJ decreased from 0.67 to 0.49.Thus, taking climate factors into consideration could improve the accuracy of model and enhance its performance.Table 3 summarizes the appearance order of six parameters in the classification tree based on the minimum-cost trees among binary decision trees in the three basins.The appearance order was summarized from binary decision tree from top to bottom, and from left to right.Elevation, precipitation and air temperature occurred mainly in the first three orders of classification trees.Air temperature was the most important factor, which firstly appeared in the NJ and SD.Elevation was first to appear in the SU, second to appear in the SD, and third to appear in the NJ.Precipitation was second to appear in the SU and NJ.Elevation was second to appear in the SD, and third to appear in the NJ.Slope was third to appear in the SD and SU.The appearance order indicates the importance of controlling factors, and earlier the appearance means that the chosen factor plays a more dominant role.According to the first appearance and frequency of each factor, a priority order was summarized as follow: air temperature > precipitation > elevation > slope > NDVI > aspect.

Discussion
Fig. 5 illustrates the annual means of SCD, daily air temperature (℃) and yearly total precipitation (mm) in three basins from HY2003 to HY2014.The number of climate stations for the NJ, SD and SU were 15, 15 and 11 respectively.The SCD in all the three basins showed  an increasing trend with an annual growth of 1.36, 2.04 and 2.71 days respectively, which were derived from linear regression equations.It is noteworthy that the spatial distribution of HY 2013 and HY2010 exhibited abnormal patterns compared to other years, especially for HY2013.Among all 12 years, HY 2013 had higher precipitation with the lowest temperature.With NJ as an example, the precipitation was the greatest and the air temperature was the lowest.The situation was similar in HY2010.Precipitation fall in the form of snow during winter in Northeast China.Snowfall tended to occur at low temperatures, and the snow cover was widely distributed in the two years; where low temperatures enabled the snow cover to prolong in winter.
According to the first appearance and frequency of each factor, a priority order was summarized as follows: air temperature > precipitation > elevation > slope > NDVI > aspect.Air temperature, precipitation and elevation are considered to be the three most important factors.The response of SCD to climate change showed vertical influences obviously.Knowls et al. (2006) concluded that there is an elevation threshold where the precipitation exists in the form of rainfall or snowfall.Daly et al. (1994) found that the influence exerted by the surface temperature on the time of snowfall and snowmelt also exhibits obvious changes with elevation.In the mountainous areas like the SU, the precipitation plays a more crucial role than temperature; but in the plain area such as the SD, the situation is completely different.Therefore, there exists an elevation threshold for the role of precipitation and air temperature.Above the threshold, the snow cover existence mainly depends on water or precipitation available; and below the threshold, the persistence of snow cover mainly depends on available heat energy that could be expressed by air temperature.
In general, elevation ranks as first order as other parameters are either closely related to elevation or could be directly derived from elevation (Trujillo et al., 2012).Elevation may affect the type and abundance of plants.The simulation results showed that climate exerts a crucial influence on snow cover dynamics.Air temperature could be estimated by the lapse rate of elevation, which is defined as the rate at which the atmospheric temperature decreases with an increase in altitude.Moreover, aspect and slope could be accurately calculated from elevation.As a result, the snow cover distribution closely related to elevation or elevation gradient.
Different factors have different impact on snow distribution according to spatial scales.Latitude and elevation influence the distribution of snow cover at large scale (10-1000 km), and vegetation and topography control the changes in snow cover at a medium scale (1-10 km) (Balk and Elder, 2000).Solar radiation and snowdrift are the primary factors at smaller scales, which were not considered in our study owing to the spatial size of satellite data.It is difficult to identify the influence of snowdrift and solar radiation for the whole area or selectively for certain areas having complex topography.

Conclusions
In this study, SCDs among the three basins in Songhua River in Northeast China were compared based on MODIS data, and the influence of topography, climate and vegetation on snow cover was analyzed using a binary decision tree.The obtained results show that average SCDs of the NJ, SU and SD basins are 69.43,98.14 and 88.84 d, with an annual growth of 1.36, 2.04 and 2.71 d, respectively.The binary decision tree results show that elevation, precipitation and air temperature has been found to be the three most influential factors affecting snow cover persistence.The spatial pattern of snow cover duration has a vertical distribution, closely related to elevation, proved by different spatial pattern of three basins.In the mountainous areas, the influence of precipitation is more crucial; whereas in the plain areas, air temperature plays a more crucial role.The role of precipitation and air temperature closely depend on the topography, and whether an elevation threshold for precipitation and air temperature exists will be studied in the future work by considering more influencing factors such as solar radiation and redistribution that control the physical process of snow accumulation and melting.

Fig. 1
Fig. 1Spatial distribution of three basins of Songhua River in Northeast China: Nenjiang River Basin (NJ), Downstream Songhua River Basin (SD) and Upstream Songhua River Basin (SU).

Fig. 2 Fig. 3
Fig. 2 Spatial distribution of snow cover duration (SCD) in three basins of Songhua River, Northeast China from hydrological year (HY) 2003 to HY2014

Fig. 5
Fig. 5 Annual Means of SCD, daily air temperature (℃) and total precipitation (mm) in the Nenjiang River Basin (NJ, a), Downstream Songhua River Basin (SD, b) and Upstream Songhua River Basin (SU, c) from HY2003 to HY2014

Table 1
The overview of three basins of Songhua River in Northeast China: Nenjiang River Basin (NJ), Downstream Songhua River Basin (SD) and Upstream Songhua River Basin (SU)

Table 2
Cloud cover fraction (%) and snow cover fraction (%) of four Moderate Resolution Imaging Spectroradiometer (MODIS) snow cover products (Chen et al., 2014) image combination reduced the cloud cover pixel effectively, which agree with the observations from our previous work for the whole Northeast China(Chen et al., 2014).

Table 3
Appearance order of six parameters in the classification tree in three basins: Nenjiang River Basin (NJ), Downstream Songhua River Basin (SD) and Upstream Songhua River Basin (SU) SD and SU stands for Nenjiang River Basin, Downstream Songhua River Basin and Upstream Songhua River Basin separately.