QU Le ’an, LI Manchun, CHEN Zhenjie, ZHI Junjun, 2021. A Modified Self-adaptive Method for Mapping Annual 30-m Land Use/Land Cover Using Google Earth Engine: A Case Study of Yangtze River Delta. Chinese Geographical Science, 31(5): 782−794 doi:  10.1007/s11769-021-1226-4
Citation: QU Le ’an, LI Manchun, CHEN Zhenjie, ZHI Junjun, 2021. A Modified Self-adaptive Method for Mapping Annual 30-m Land Use/Land Cover Using Google Earth Engine: A Case Study of Yangtze River Delta. Chinese Geographical Science, 31(5): 782−794 doi:  10.1007/s11769-021-1226-4

A Modified Self-adaptive Method for Mapping Annual 30-m Land Use/Land Cover Using Google Earth Engine: A Case Study of Yangtze River Delta

doi: 10.1007/s11769-021-1226-4
Funds:  Under the auspices of the National Key Research and Development Program of China (No. 2017YFB0504205), National Natural Science Foundation of China (No. 41571378), Natural Science Research Project of Higher Education in Anhui Provence (No. KJ2020A0089)
More Information
  • Corresponding author: LI Manchun. E-mail: Manchunli@nju.edu.cn
  • Received Date: 2021-01-18
  • Accepted Date: 2021-05-11
  • Publish Date: 2021-09-05
  • Annual Land Use/Land Cover (LULC) change information at medium spatial resolution (i.e., at 30 m) is used in applications ranging from land management to achieving sustainable development goals related to food security. However, obtaining annual LULC information over large areas and long periods is challenging due to limitations on computational capabilities, training data, and workflow design. Using the Google Earth Engine (GEE), which provides a catalog of multi-source data and a cloud-based environment, we developed a novel methodology to generate a high accuracy 30-m LULC cover map collection of the Yangtze River Delta by integrating free and public LULC products with Landsat imagery. Our major contribution is a hybrid approach that includes three major components: 1) a high-quality training dataset derived from multi-source LULC products, filtered by k-means clustering analysis; 2) a yearly 39-band stack feature space, utilizing all available Landsat data and DEM data; and 3) a self-adaptive Random Forest (RF) method, introduced for LULC classification. Experimental results show that our proposed workflow achieves an average classification accuracy of 86.33% in the entire Delta. The results demonstrate the great potential of integrating multi-source LULC products for producing LULC maps of increased reliability. In addition, as the proposed workflow is based on open source data and the GEE cloud platform, it can be used anywhere by anyone in the world.
  • 加载中
  • [1] Adepoju K A, Adelabu S A, 2020. Improving accuracy of Landsat-8 OLI classification using image composite and multisource data with Google Earth Engine. Remote Sensing Letters, 11(2): 107–116. doi:  10.1080/2150704X.2019.1690792
    [2] Anchang J Y, Prihodko L, Ji W J et al., 2020. Toward operational mapping of woody canopy cover in tropical savannas using Google Earth Engine. Frontiers in Environmental Science, 8: 4. doi:  10.3389/fenvs.2020.00004
    [3] Bailly A, Chapel L, Tavenard R et al., 2017. Nonlinear time-series adaptation for land cover classification. IEEE Geoscience and Remote Sensing Letters, 14(6): 896–900. doi:  10.1109/LGRS.2017.2686639
    [4] Bullock E L, Woodcock C E, Olofsson P, 2020. Monitoring tropical forest degradation using spectral unmixing and Landsat time series analysis. Remote Sensing of Environment, 238: 110968. doi:  10.1016/j.rse.2018.11.011
    [5] Capolupo A, Monterisi C, Tarantino E, 2020. Landsat images classification algorithm (LICA) to automatically extract land cover information in Google Earth Engine environment. Remote Sensing, 12(7): 1201. doi:  10.3390/rs12071201
    [6] Chakraborty A, Sachdeva K, Joshi P K, 2016. Mapping long-term land use and land cover change in the central Himalayan region using a tree-based ensemble classification approach. Applied Geography, 74: 136–150. doi:  10.1016/j.apgeog.2016.07.008
    [7] Chen Lin, Ren Chunying, Zhang Bai et al., 2018. Spatiotemporal dynamics of coastal wetlands and reclamation in the Yangtze estuary during past 50 years (1960s–2015). Chinese Geographical Science, 28(3): 386–399. doi:  10.1007/s11769-017-0925-3
    [8] Chen S, Li G, Xu Z G et al., 2019. Combined impact of socioeconomic forces and policy implications: spatial-temporal dynamics of the ecosystem services value in Yangtze River Delta, China. Sustainability, 11(9): 2622. doi:  10.3390/su11092622
    [9] Daldegan G A, Roberts D A, de Figueiredo Ribeiro F, 2019. Spectral mixture analysis in Google Earth Engine to model and delineate fire scars over a large extent and a long time-series in a rainforest-savanna transition zone. Remote Sensing of Environment, 232: 111340. doi:  10.1016/j.rse.2019.111340
    [10] Feng Y J, Liu Y, Tong X H, 2018. Comparison of metaheuristic cellular automata models: a case study of dynamic land use simulation in the Yangtze River Delta. Computers Environment and Urban Systems, 70: 138–150. doi:  10.1016/j.compenvurbsys.2018.03.003
    [11] Ghorbanian A, Kakooei M, Amani M et al., 2020. Improved land cover map of Iran using Sentinel imagery within Google Earth Engine and a novel automatic workflow for land cover classification using migrated training samples. ISPRS Journal of Photogrammetry and Remote Sensing, 167: 276–288. doi:  10.1016/j.isprsjprs.2020.07.013
    [12] Ghosh A, Sharma R, Joshi P K, 2014. Random forest classification of urban landscape using Landsat archive and ancillary data: combining seasonal maps with decision level fusion. Applied Geography, 48: 31–41. doi:  10.1016/j.apgeog.2014.01.003
    [13] Gong P, Liu H, Zhang M N et al., 2019. Stable classification with limited sample: transferring a 30-m resolution sample set collected in 2015 to mapping 10-m resolution global land cover in 2017. Science Bulletin, 64(6): 370–373. doi:  10.1016/j.scib.2019.03.002
    [14] Gorelick N, Hancher M, Dixon M et al., 2017. Google Earth Engine: planetary-scale geospatial analysis for everyone. Remote Sensing of Environment, 202: 18–27. doi:  10.1016/j.rse.2017.06.031
    [15] Gumma M K, Thenkabail P S, Teluguntla P G et al., 2020. Agricultural cropland extent and areas of South Asia derived using Landsat satellite 30-m time-series big-data using random forest machine learning algorithms on the Google Earth Engine cloud. GIScience & Remote Sensing, 57(3): 302–322. doi:  10.1080/15481603.2019.1690780
    [16] Hird J N, DeLancey E R, McDermid G J et al., 2017. Google Earth Engine, open-access satellite data, and machine learning in support of large-area probabilistic wetland mapping. Remote Sensing, 9(12): 1315. doi:  10.3390/rs9121315
    [17] Huang H B, Chen Y L, Clinton N et al., 2017. Mapping major land cover dynamics in Beijing using all Landsat images in Google Earth Engine. Remote Sensing of Environment, 202: 166–176. doi:  10.1016/j.rse.2017.02.021
    [18] Huang H B, Wang J, Liu C X et al., 2020. The migration of training samples towards dynamic global land cover mapping. ISPRS Journal of Photogrammetry and Remote Sensing, 161: 27–36. doi:  10.1016/j.isprsjprs.2020.01.010
    [19] Hurni K, Van Den Hoek J, Fox J, 2019. Assessing the spatial, spectral, and temporal consistency of topographically corrected Landsat time series composites across the mountainous forests of Nepal. Remote Sensing of Environment, 231: 111225. doi:  10.1016/j.rse.2019.111225
    [20] Ji H Y, Li X, Wei X C et al., 2020. Mapping 10-m resolution rural settlements using multi-source remote sensing datasets with the Google Earth Engine platform. Remote Sensing, 12(17): 2832. doi:  10.3390/rs12172832
    [21] Kakooei M, Baleghi Y, 2020. VHR semantic labeling by random forest classification and fusion of spectral and spatial features on Google Earth Engine. Journal of AI and Data Mining, 8(3): 357–370. doi:  10.22044/JADM.2020.8252.1964
    [22] Li H, Wan W, Fang Y et al., 2019. A Google Earth Engine-enabled software for efficiently generating high-quality user-ready Landsat mosaic images. Environmental Modelling & Software, 112: 16–22. doi:  10.1016/j.envsoft.2018.11.004
    [23] Li H, Wang C Z, Zhong C et al., 2017. Mapping urban bare land automatically from Landsat imagery with a simple index. Remote Sensing, 9(3): 249. doi:  10.3390/rs9030249
    [24] Li W J, Dong R M, Fu H H et al., 2020. Integrating Google Earth imagery with Landsat data to improve 30-m resolution land cover mapping. Remote Sensing of Environment, 237: 111563. doi:  10.1016/j.rse.2019.111563
    [25] Li X C, Gong P, 2016. An ‘exclusion-inclusion’ framework for extracting human settlements in rapidly developing regions of China from Landsat images. Remote Sensing of Environment, 186: 286–296. doi:  10.1016/j.rse.2016.08.029
    [26] Liu H, Gong P, Wang J et al., 2020. Annual dynamics of global land cover and its long-term changes from 1982 to 2015. Earth System Science Data, 12(2): 1217–1243. doi:  10.5194/essd-12-1217-2020
    [27] Liu H, Gong P, Wang J et al., 2021. Production of global daily seamless data cubes and quantification of global land cover change from 1985 to 2020: iMap World 1.0. Remote Sensing of Environment, 258: 112364. doi:  10.1016/j.rse.2021.112364
    [28] Long H L, 2014. Land consolidation: an indispensable way of spatial restructuring in rural China. Journal of Geographical Sciences, 24(2): 211–225. doi:  10.1007/s11442-014-1083-5
    [29] Lopes M, Fauvel M, Ouin A et al., 2017. Spectro-temporal heterogeneity measures from dense high spatial resolution satellite image time series: application to grassland species diversity estimation. Remote Sensing, 9(10): 993. doi:  10.3390/rs9100993
    [30] Mack B, Leinenkugel P, Kuenzer C et al., 2017. A semi-automated approach for the generation of a new land use and land cover product for Germany based on Landsat time-series and Lucas in-situ data. Remote Sensing Letters, 8(3): 244–253. doi:  10.1080/2150704X.2016.1249299
    [31] Mahdianpari M, Jafarzadeh H, Granger J E et al., 2020. A large-scale change monitoring of wetlands using time series Landsat imagery on Google Earth Engine: a case study in Newfoundland. GIScience & Remote Sensing, 57(8): 1102–1124. doi:  10.1080/15481603.2020.1846948
    [32] Mao D H, Luo L, Wang Z M et al., 2018. Conversions between natural wetlands and farmland in China: a multiscale geospatial analysis. Science of the Total Environment, 634: 550–560. doi:  10.1016/j.scitotenv.2018.04.009
    [33] Mao D H, Tian Y L, Wang Z M et al., 2021. Wetland changes in the Amur River Basin: differing trends and proximate causes on the Chinese and Russian sides. Journal of Environmental Management, 111670. doi:  10.1016/j.jenvman.2020.111670
    [34] Millard K, Richardson M, 2015. On the importance of training data sample selection in random forest image classification: a case study in Peatland ecosystem mapping. Remote Sensing, 7(7): 8489–8515. doi:  10.3390/rs70708489
    [35] Mohajane M, Essahlaoui A, Oudija F et al., 2018. Land use/land cover (LULC) using Landsat data series (MSS, TM, ETM+ and OLI) in Azrou forest, in the central middle atlas of Morocco. Environments, 5(12): 131. doi:  10.3390/environments5120131
    [36] Müller H, Rufin P, Griffiths P et al., 2015. Mining dense Landsat time series for separating cropland and pasture in a heterogeneous Brazilian savanna landscape. Remote Sensing of Environment, 156: 490–499. doi:  10.1016/j.rse.2014.10.014
    [37] Pandey P C, Koutsias N, Petropoulos G P et al., 2021. Land use/land cover in view of earth observation: data sources, input dimensions, and classifiers: a review of the state of the art. Geocarto International, 36(9): 957–988. doi:  10.1080/10106049.2019.1629647
    [38] Shen W S, Lin X G, Gao N et al., 2008. Land use intensification affects soil microbial populations, functional diversity and related sup-pressiveness of cucumber Fusarium wilt in China’s Yangtze River Delta. Plant and Soil, 306(1-2): 117–127. doi:  10.1007/s11104-007-9472-5
    [39] Simonetti D, Simonetti E, Szantoi Z et al., 2015. First results from the phenology-based synthesis classifier using Landsat 8 imagery. IEEE Geoscience and Remote Sensing Letters, 12(7): 1496–1500. doi:  10.1109/LGRS.2015.2409982
    [40] Tamiminia H, Salehi B, Mahdianpari M et al., 2020. Google Earth Engine for geo-big data applications: a meta-analysis and systematic review. ISPRS Journal of Photogrammetry and Remote Sensing, 164: 152–170. doi:  10.1016/j.isprsjprs.2020.04.001
    [41] Teluguntla P, Thenkabail P S, Oliphant A et al., 2018. A 30-m landsat-derived cropland extent product of Australia and China using random forest machine learning algorithm on Google Earth Engine cloud computing platform. ISPRS Journal of Photogrammetry and Remote Sensing, 144: 325–340. doi:  10.1016/j.isprsjprs.2018.07.017
    [42] Viana C M, Girão I, Rocha J, 2019. Long-term satellite image time-series for land use/land cover change detection using refined open source data in a rural region. Remote Sensing, 11(9): 1104. doi:  10.3390/rs11091104
    [43] Wagle N, Acharya T D, Kolluru V et al., 2020. Multi-temporal land cover change mapping using google earth engine and ensemble learning methods. Applied Sciences, 10(22): 8083. doi:  10.3390/app10228083
    [44] Wan L, Liu H Y, Gong H B et al., 2020. Effects of climate and land use changes on vegetation dynamics in the Yangtze River Delta, China based on abrupt change analysis. Sustainability, 12(5): 1955. doi:  10.3390/su12051955
    [45] Wu Q S, 2020. Geemap: a python package for interactive mapping with Google Earth Engine. Journal of Open Source Software, 5(51): 2305. doi:  10.21105/joss.02305
    [46] Xu H Z Y, Wei Y C, Liu C et al., 2019. A scheme for the long-term monitoring of impervious-relevant land disturbances using high fre-quency Landsat archives and the Google Earth Engine. Remote Sensing, 11(16): 1891. doi:  10.3390/rs11161891
    [47] Xu J P, Xiao W, He T T et al., 2021. Extraction of built-up area using multi-sensor data: a case study based on Google earth engine in Zhejiang Province, China. International Journal of Remote Sensing, 42(2): 389–404. doi:  10.1080/01431161.2020.1809027
    [48] Xu X B, Yang G S, Tan Y et al., 2018. Ecosystem services trade-offs and determinants in China’s Yangtze River Economic Belt from 2000 to 2015. Science of the Total Environment, 634: 1601–1614. doi:  10.1016/j.scitotenv.2018.04.046
    [49] Yu M, Yang Y J, Chen F et al., 2019. Response of agricultural multifunctionality to farmland loss under rapidly urbanizing processes in Yangtze River Delta, China. Science of the Total Environment, 666: 1–11. doi:  10.1016/j.scitotenv.2019.02.226
    [50] Zeng L L, Wardlow B D, Xiang D X et al., 2020. A review of vegetation phenological metrics extraction using time-series, multispectral satellite data. Remote Sensing of Environment, 237: 111511. doi:  10.1016/j.rse.2019.111511
    [51] Zhai Y G, Qu Z Y, Hao L, 2018. Land cover classification using integrated spectral, temporal, and spatial features derived from remotely sensed images. Remote Sensing, 10(3): 383. doi:  10.3390/rs10030383
    [52] Zhang C, Wei S Q, Ji S P et al., 2019. Detecting large-scale urban land cover changes from very high resolution remote sensing images using CNN-based classification. ISPRS International Journal of Geo-Information, 8(4): 189. doi:  10.3390/ijgi8040189
    [53] Zhang D J, Pan Y Z, Zhang J S et al., 2020. A generalized approach based on convolutional neural networks for large area cropland mapping at very high resolution. Remote Sensing of Environment, 247: 111912. doi:  10.1016/j.rse.2020.111912
    [54] Zhao F, Huang C Q, Zhu Z L, 2015. Use of vegetation change tracker and support vector machine to map disturbance types in Greater Yellowstone ecosystems in a 1984−2010 Landsat time series. IEEE Geoscience and Remote Sensing Letters, 12(8): 1650–1654. doi:  10.1109/LGRS.2015.2418159
    [55] Zhao J, Zhong Y F, Hu X et al., 2020. A robust spectral-spatial approach to identifying heterogeneous crops using remote sensing imagery with high spectral and spatial resolutions. Remote Sensing of Environment, 239: 111605. doi:  10.1016/j.rse.2019.111605
  • 加载中
通讯作者: 陈斌, bchen63@163.com
  • 1. 

    沈阳化工大学材料科学与工程学院 沈阳 110142

  1. 本站搜索
  2. 百度学术搜索
  3. 万方数据库搜索
  4. CNKI搜索

Figures(8)  / Tables(1)

Article Metrics

Article views(2) PDF downloads(0) Cited by()

Proportional views
Related

A Modified Self-adaptive Method for Mapping Annual 30-m Land Use/Land Cover Using Google Earth Engine: A Case Study of Yangtze River Delta

doi: 10.1007/s11769-021-1226-4
Funds:  Under the auspices of the National Key Research and Development Program of China (No. 2017YFB0504205), National Natural Science Foundation of China (No. 41571378), Natural Science Research Project of Higher Education in Anhui Provence (No. KJ2020A0089)

Abstract: Annual Land Use/Land Cover (LULC) change information at medium spatial resolution (i.e., at 30 m) is used in applications ranging from land management to achieving sustainable development goals related to food security. However, obtaining annual LULC information over large areas and long periods is challenging due to limitations on computational capabilities, training data, and workflow design. Using the Google Earth Engine (GEE), which provides a catalog of multi-source data and a cloud-based environment, we developed a novel methodology to generate a high accuracy 30-m LULC cover map collection of the Yangtze River Delta by integrating free and public LULC products with Landsat imagery. Our major contribution is a hybrid approach that includes three major components: 1) a high-quality training dataset derived from multi-source LULC products, filtered by k-means clustering analysis; 2) a yearly 39-band stack feature space, utilizing all available Landsat data and DEM data; and 3) a self-adaptive Random Forest (RF) method, introduced for LULC classification. Experimental results show that our proposed workflow achieves an average classification accuracy of 86.33% in the entire Delta. The results demonstrate the great potential of integrating multi-source LULC products for producing LULC maps of increased reliability. In addition, as the proposed workflow is based on open source data and the GEE cloud platform, it can be used anywhere by anyone in the world.

QU Le ’an, LI Manchun, CHEN Zhenjie, ZHI Junjun, 2021. A Modified Self-adaptive Method for Mapping Annual 30-m Land Use/Land Cover Using Google Earth Engine: A Case Study of Yangtze River Delta. Chinese Geographical Science, 31(5): 782−794 doi:  10.1007/s11769-021-1226-4
Citation: QU Le ’an, LI Manchun, CHEN Zhenjie, ZHI Junjun, 2021. A Modified Self-adaptive Method for Mapping Annual 30-m Land Use/Land Cover Using Google Earth Engine: A Case Study of Yangtze River Delta. Chinese Geographical Science, 31(5): 782−794 doi:  10.1007/s11769-021-1226-4
  • Land Use/Land Cover (LULC) data are important in applications such as land management, agricultural monitoring, ecological service research, and climate change assessment (Zhao et al., 2015; Mao et al., 2018; Teluguntla et al., 2018; Gong et al., 2019). With advances in remote sensing technology toward providing satellite images, the corresponding datasets have been effectively applied to classify LULC types at different spatial scales from global to local (Mohajane et al., 2018; Zhang et al., 2019; Li et al., 2020). Among the different remote sensing datasets, Landsat data provides free global coverage of satellite images with a long history in comparison with other open-access remotely sensed data (e.g., MODIS, Sentinel-2), which is advantageous for LULC classification tasks (Li and Gong, 2016; Huang et al., 2017; Liu et al., 2020; Mao et al., 2021).

    The conventional large-area long-term LULC classification method uses remote sensing-derived time series with a supervised non-parametric classifier (Zhai et al., 2018; Bullock et al., 2020). In areas with strong landscape heterogeneity, there are limitations in employing this kind of supervised classifier (Ghosh et al., 2014; Müller et al., 2015). Firstly, with strong heterogeneity of landscape, the phenomenon of ‘different objects with the same spectrum, different spectrums within the same object’ often occurs (Long, 2014). Further research is needed to develop high-precision classifiers (Chen et al., 2018). Secondly, there are challenges in obtaining a large amount of long-term training data (Viana et al., 2019). Finally, with a large research area, a large amount of remote sensing data need to be processed, requiring strong computational and storage capabilities (Daldegan et al., 2019; Tamiminia et al., 2020).

    The random forest (RF) classifier is a commonly used supervised classifier for LULC mapping (Chakraborty et al., 2016). In general, the use of remote sensing time series data combined with the RF classifier can obtain a LULC map with high classification accuracy in an area with strong landscape heterogeneity (Zeng et al., 2020). This is because the phenological information provided by time series data can partially solve the problem of ‘different objects with the same spectrum, different spectrums within the same object’ (Lopes et al., 2017). However, challenges remain regarding large-area LULC classification based on time series (Ji et al., 2020). In particular, only by using a large amount of high-precision training data can higher classification accuracy be obtained (Ghorbanian et al., 2020).

    Over the past decade, it has been demonstrated that it is efficient to use existing LULC maps as a source of training data (Viana et al., 2019). This is advantageous because it: 1) allows classification in an automated manner without the need for interactive manual training data, 2) provides a potentially large and geographically distributed training dataset, and 3) enables satellite data to be mapped with existing LULC maps (Pandey et al., 2021). However, existing LULC maps may have some misclassified results (Mack et al., 2017). Therefore, the use of existing LULC maps as a source should be carefully considered to ensure that the generated training data have a good enough level of accuracy (Li et al., 2017).

    Computational power, data storage, management, and processing times have also traditionally been restrictions for using remote sensing time series over large areas (Anchang et al., 2020). Google Earth Engine (GEE) not only provides powerful computing ability for free but can also directly access a variety of open source data (Hird et al., 2017; Capolupo et al., 2020). The workflow designed on GEE not only solves the limitation of computing power and data source but also helps other researchers to conduct similar research (Wu, 2020; Xu et al., 2021).

    The aims of the present study are: 1) to classify Landsat time series data supported by GEE to obtain high-precision annual LULC maps of the Yangtze River Delta (YRD) from 1992 to 2015 and 2) to analyze the characteristics of LULC changes in the YRD region. Specifically, we address the following research questions: 1) how to get a high-precision sample dataset generated from multi-source LULC products? 2) How to construct a feature space, so that time series classification can be carried out in a cloudy and rainy area (such as YRD)? 3) How to build a local adaptive classifier to improve the classification accuracy of LULC in an area with strong landscape heterogeneity?

  • The Yangtze River Delta (YRD) is situated in the eastern China and covers four provinces: Anhui Province, Jiangsu Province, Zhejiang Province, and Shanghai Municipality (Fig. 1), with a surface area of 348 000 km2 (Chen et al., 2019). The topography of the YRD is dominated by plains, hillsides, and mountains (Feng et al., 2018). The average elevation is 140.17 m (Shen et al., 2008). This region is under a monsoon climate; the annual average temperature for the growing season (March to October) is 18℃ to 22℃ and the average annual precipitation for the growing season is 800 to 1400 mm (Wan et al., 2020).

    Figure 1.  Geographical location of Yangtze River Delta (YRD) in the eastern China

    The YRD is one of China’s most economically developed regions and plays an important role in the social and economic development of the country (Yu et al., 2019). In 2015, the YRD accounted for 16.06% of China’s population, and its GDP accounted for 23.50% of the country’s GDP (Zhang et al., 2020). According to the Statistical Yearbook from China National Knowledge Infrastructure (http://www.cnki.net/), the populations and GDPs of the YRD region increased significantly from 1992 to 2015. This growth in the population and economic development has been accompanied by rapid urbanization and great changes in land use: the loss of farmland and expansion of urban areas, in the region (Xu et al., 2018).

  • To construct a sample set for LULC classification, we used several datasets of well-recognized LULC products, including 1992–2015 year by year European Space Agency Climate Change Initiative (ESA-CCI 300) LULC products, 2001–2015 annual MCD12Q1 (MODIS LULC cover) products, 2000 and 2010 LULC data developed by the China National Basic Geographic Information Center (GlobeLand30), and the 2015 Finer Resolution Observation and Monitoring of Global Land Cover (From-GLC), developed by Tsinghua University, China (Table 1). The GEE platform can access MCD12Q1 products directly, and these do not require uploading into the application (Gorelick et al., 2017). The other three sets of LULC products need to be uploaded to the GEE platform. All work was performed at a 30 m resolution. The ESA-CCI300 and MCD12Q1 products were resampled on the GEE platform.

    DataYearTemporal
    resolution
    Spatial
    resolution / m
    Data sources
    Landsat 5* 1992–2012 16 d 30 http://landsat.usgs.gov
    Landsat 7* 1999–2015 16 d 30 http://landsat.usgs.gov
    Landsat 8* 2013–2015 16 d 30 http://landsat.usgs.gov
    SRTM3* 2000 30 http://www2.jpl.nasa.gov/srtm
    ESA-CCI300 1992–2015 1 yr 300 https://www.esa-landcover-cci.org
    MCD12Q1.006* 2001–2015 1 yr 500 https://lpdaac.usgs.gov/dataset_discovery/modis/modis_products_table/mcd12q1
    GlobeLand30 2000/2010 30 http://www.globeland30.com
    From-GLC 2015 30 http://data.ess.tsinghua.edu.cn
    Boundary 2015 http://www.resdc.cn
    Notes: * represents data available online (https://earthengine.google.com). ESA-CCI300 (European Space Agency Climate Change Initiative), MCD12Q1.006, GlobeLand30, and From-GLC (Finer Resolution Observation and Monitoring of Global Land Cover) are LULC (Land Use/ Land Cover) products

    Table 1.  Datasets used in this research

    Landsat 5/7/8 Surface Reflectance (SR) data from 1992 to 2015 are the main remote sensing images for classification and can be accessed directly in the GEE (Xu et al., 2019). Previous research has shown that the Normalized Difference Vegetation Index (NDVI) is sensitive to vegetation characteristics, the Normalized Difference Water Index (NDWI) can identify bodies of water, and the Normalized Difference Built-up Index (NDBI) can distinguish built-up areas effectively (Simonetti et al., 2015; Bailly et al., 2017; Wagle et al., 2020). In the study, the spectral indices (NDVI, NDWI, and NDBI) calculated by the Landsat SR products in the YRD are also used. The spectral indices were introduced for the calculation of the spectral-temporal metric.

    Topographic features can affect regional climatic conditions and vegetation growth, so these are often used as auxiliary data for LULC classification (Adepoju and Adelabu, 2020). Topographic features generated from DEM (Digital Elevation Model) include elevation (affecting temperature and precipitation), slope, and aspect (affecting the results of solar radiation and vegetation growth). To describe terrain features, we use the Shuttle Radar Topography Mission (SRTM) data to generate topographic features (Hurni et al., 2019). The GEE platform has built-in SRTM data, which can be used directly.

  • We extracted the SR Tier 1 datasets of the Landsat 5/7 (B1–B5 and B7 bands) and Landsat 8 (B2–B7 bands) in the growing season during 1992–2015. The first pre-processing step was to build cloud-free Landsat tile mosaics for each year. For this, we used the CFMASK algorithm provided by the GEE and the pixel quality assessment (pixel_qa) information available in the Landsat collection (Li et al., 2019). The cloud-masked Landsat scenes were combined to produce NDVI, NDWI, and NDBI for the annual growth period of each year. This procedure required optimal spectral contrast and separability amongst the LULC classes.

    The annual Landsat mosaics were generated with statistical reducers including median, standard deviation, minimum, and maximum. After building the annual mosaics, the next step was to build the feature space for the self-adaptive RF classifier. For this process, we used the compositional, spectral, and temporal information extracted from the annual image mosaics. The NDVI values of all cloud-free pixels from each year were divided into quartiles; the median values of the higher quartile were considered as the wet-season image. Finally, a total of 39 bands consisting of 4 × 9 temporal feature bands and 3 topographic feature bands (elevation, slope, and aspect) were available for LULC classification.

  • Supervised classification usually requires a certain number of training samples and verification samples (Zhao et al., 2020). Typically, traditional research uses manual visual interpretation to obtain sample points (Huang et al., 2020). For a study with a large area and long study period, such a method presents considerable practical difficulties (Ghorbanian et al., 2020). This study proposes a new method to achieve highly credible sample points. The specific steps are as follows:

    (1) Based on From-GLC (2015), all other LULC products were reclassified, and the LULC cover classes were divided into eight classes including: cropland, forest, shrubland, grassland, wetland, water bodies, built-up land, and unused land.

    (2) ESA-CCI (1992–2015), MCD12Q1 (2001–2015), GlobeLand30 (2000, 2010), and the From-GLC (2015) data were overlaid, and pixels with completely consistent LULC types that had not changed from 1992 to 2015 were selected. ESA-CCI300 and MCD12Q1 were reduced to 30 m resolutions in the GEE using the Reduce Resolution function. GEE performs nearest neighbor resampling by default.

    (3) From the selected pixels, training sample points were randomly selected; the number of points was slightly adaptive according to the area ratios of the different LULC types.

    (4) The created samples still had a high probability of spectral-temporal signature confusion among LULC classes. Therefore, we applied the k-means clustering technique to refine the samples. The clustering process is essentially divided into three distinct processes: 1) classification-trimmed likelihood calculation, 2) cluster computation, and 3) mean discriminant factor value calculation.

    Overall, the samples were distributed uniformly and randomly throughout the study area; however, for some LULC types with relatively small and patchy areas (such as unused lands) a relatively dense distribution occurred. Therefore, we exported the refined sample point data to ArcGIS and carried out reselection at 1500 m resolution to reduce the correlation between samples. Through this process, we obtained 20 859 reference sample points. Finally, we randomly selected 10 430 points from the sample library to be used for classification training. To ensure the authenticity and reliability of the verification samples, we selected them through visual inspection. In the end, we obtained 2857 reliable sample points in the study area to verify the LULC classification results.

  • The classification process considered a spatial and temporal stratification, individually training and classifying all 9368 Landsat tiles that cover the YRD. The LULC was divided into eight classes which were exactly the same as used for sample points (details see section 3.2). Considering that LULC classes are highly susceptible to climatic conditions and topography throughout the YRD territory, this approach allowed the classification models to better identify all LULC classes. To minimize the impact of spatial stratification, part of the training set was shared among different classification models. Thereby, the 3 × 3 scenes sample points were used to train the self-adaptive RF algorithm (Fig. 2). Equally important, the temporal stratification was also intended to minimize the impact of spectral and radiometric differences among the Landsat sensors in the classification results, since, for each year, a classifier was trained considering only images obtained by a single satellite.

    Figure 2.  Flowchart of self-adaptive RF (random forest) LULC (Land Use/Land Cover) classification

    The self-adaptive RF models are constructed by the smileRandomForest function in the GEE (Kakooei and Baleghi, 2020). In order to use the smileRandomForest function, two parameters need to be set: the number of decision trees to create per class (numberoftrees) and the minimum size of a terminal (minleaf) (Gumma et al., 2020). Finally, through repeated comparative experiments, we set numberoftrees to 100 and minleaf to 10 in all models.

  • To better homogenize these classification results, we applied a spatial filter and a temporal filter, capable of minimizing abrupt and sometimes unrealistic variations, simultaneously considering these two dimensions. First, we used a spatial filter in the edge area of each scene. There are some pixels with inconsistent classification results in these edge areas. These pixels are merged in accordance with the majority agreement rule by the mode function in GEE (Gorelick et al., 2017). All of the classified scenes were merged into a LULC map collection year by year. Next, we applied a temporal filter to the LULC map collection. We detected whether the LULC classes in year N − 1 and year N + 1 are consistent pixel by pixel. If so, the LULC classes in year N must be consistent with those LULC classes. Except for the first and last years, we iterated the above process year by year and obtained final LULC classification results for the YRD.

    We used several measures to assess and statistically compare the accuracy of our classifications. First, we calculated the classification error matrices for each year. From these matrices, we then quantified overall accuracy (OA). Second, we calculated producer’s accuracy (PA), user’s accuracy (UA), and F1-score, which is the harmonic mean between user’s accuracy and producer’s accuracy. PA, UA, and F1-score can be calculated for each class i as follows:

    $$ {PA}_{i}={\sum }_{j=1}^{r}\frac{{n}_{ii}}{{n}_{ij}} $$ (1)
    $$ {UA}_{i}={\sum }_{j=1}^{r}\frac{{n}_{ii}}{{n}_{ji}} $$ (2)
    $$ {\left({F}_{1}\right)}_{i}=\frac{2\times {PA}_{i}\times {UA}_{i}}{{PA}_{i}+{UA}_{i}} $$ (3)

    where r is the number of classes and nij is the element of the confusion matrix in row j and column i, that is, the count of elements of class j classified as class i. PAi, UAi, and (F1)i stand for PA, UA, and F1-score for class i, respectively.

    The F1-score is particularly useful for class-level accuracy assessment, as it gives equal importance to both PA and UA by combining PA and UA into a single measure that can be compared across confusion matrices. Finally, we estimated the unbiased area (using the sample weight obtained with our reference sample dataset) following a standard good practice protocol.

  • To evaluate the quality of the 30-m LULC results for the YRD using multi-source LULC products as training samples, an accuracy assessment of the classification was conducted. A confusion matrix, which is the primary tool used in remote sensing for accuracy assessment, was used to evaluate the accuracy. A total of 2857 verification samples were collected for all LULC classes. The evaluation was carried out based on the error matrix produced and analyzed the accuracy of the 30 m LULC classification results.

    Fig. 3 reports the classification accuracies that quantify the level of agreement between the classification and the sample data. The overall accuracies of the classification results are greater than 84.50% for every year, and the average overall accuracy is 86.33%. This shows that the overall classification level of agreement is high. The F1-score annual accuracies were close to 90% for the following main classes: cropland, forest, and water bodies. The F1-scores of built-up areas ranged between 75% and 85%, mainly due to the misclassification of many small villages in the northern part of the study area. Because shrubland, grassland, wetland, and unused land account for a very small proportion in the study area, we analyzed these LULC classes as a whole. The F1-scores were < 80% for all these classes for each year except for 2014 and 2015. The poor classification accuracy of these classes can be explained on the one hand because their proportions are low and on the other hand because the sample points of these LULC classes are small. In general, these results indicate quite reasonable classification accuracies. The results show that using multi-source LULC products to generate training data is a viable and effective option.

    Figure 3.  Classification accuracy for the LULC (Land Use/Land Cover) in the Yangtze River Delta, China from 1992 to 2015

  • By using the Landsat time series data with the self-adaptive RF method, we developed annual LULC datasets for the YRD from 1992 to 2015 (Fig. 4). The LULC classification results showed extensive change across the study area. Specifically, cropland was the most extensive LULC type but continuously decreased from 1992 to 2015. In contrast, built-up land continuously and significantly expanded due to rapid urbanization. We observed that the expanded built-up areas were mainly concentrated near the peri-urban region, possibly due to the topographic conditions. The expansion of built-up land was mainly centered on cropland around the city, which led to the continuous decrease in cropland area. Fig. 5 also indicates that areas of forest and water bodies remained largely unchanged during the study period.

    Figure 4.  Annual LULC (Land Use/Land Cover) maps for the YRD from 1992 to 2015. Maps for four selected years (1992, 2000, 2008, and 2015) are enlarged

    Figure 5.  Area percentage of different LULC (Land Use/Land Cover) classes for the YRD from 1992 to 2015

    Fig. 5 illustrates how the primary LULC types of the YRD changed from 1992 to 2015. It can be seen that the greatest change in the YRD occurred in the case of cropland, whose proportion decreased from 44.6% in 1992 to 42.5% in 2015. This was followed by forest, whose proportion decreased from 37.5% to 36.2%. In contrast, built-up land’s proportion increased from 0.8% to 2.1%. Bodies of water increased by 0.2%. The smallest changes were in grassland, whose proportion decreased by just 0.1%. Shrubland, wetland, and bare land occupy relatively small proportions of the study area. Compared with the primary LULC types (cropland, forest, water bodies, and built-up land) they did not change noticeably. In addition, due to their small proportions in the study area, they were more affected by classification uncertainty. Therefore, changes in these LULC types were excluded from the analysis in this paper.

  • It can be seen from Figs. 68 that our results are similar to other LULC products in terms of geographic distribution, and the overall layout of cropland, forest land, and water bodies is similar. Overall, our results are more accurate in identifying small ground objects than other LULC products.

    Figure 6.  Comparison with other common LULC (Land Use/Land Cover) products for the year 2015. The first column shows Landsat-8 OLI images with 30 m resolution, the second is our result with 30 m resolution, the third is ESA-CCI 300 with 300 m resolution, and the fourth is MCD12Q1 with 500 m resolution. Row (a) is a selected typical urban area in plain, row (b) is a selected typical transitional area from mountain to plain, and row (c) is a selected typical mix area

    Figure 7.  Comparison of our results with GlobeLand30 for the year 2010 in the YRD, China

    Figure 8.  Comparison of our results with From-GLC for the year 2015 in the YRD, China

    Compared with ESA-CCI300 products, the main difference lies in the built-up lands in the plain areas and the forest in the mountain areas (Fig. 6). In plain areas, more small built-up land areas were identified in our results, which may be related to the higher spatial resolution of our results compared to those of the other LULC products. In mountainous areas, our results showed more forest area compared to results from the other LULC products, with ESA-CCI300 classifying more as croplands. Compared with MCD12Q1 products, the main difference is that MCD12Q1 misclassified many shrublands while our method correctly classified them as croplands. In plain areas, large tracts of land are misclassified as built-up areas in MCD12Q1, whereas our results are more refined. We detected these inconsistent areas visually through Landsat-8 OLI images and found that our results are more accurate among the results from all the LULC products.

    In Fig. 7, we can see that our results are generally consistent with GlobeLand30 in mountainous areas, and the distribution of forest and water bodies in plain areas are also similar. The main difference is in the distribution of built-up lands and croplands in plain areas. GlobeLand30 showed many small built-up lands, and our results classify these areas as croplands. This is mainly because, in addition to Landsat data, GlobeLand30 also uses other multispectral images with higher spatial resolution and processes them manually, so that it can classify more built-up lands in the mixed pixels of rural areas (Fig. 7 left). Due to the introduction of other multispectral data, however, whatever the percentage of built-up land is in the mixed pixel, it will still be classified as built-up land (Fig. 7 right). In our results, due to the influence of MODIS and ESA data on the training points generated, the number of built-up land samples in rural areas was small. As a result, only contiguous built-up lands can be accurately identified in the classification results, while small built-up lands in rural areas showed a poor level of accuracy.

    In Fig. 8, it can be seen that From-GLC 2015 misclassifies part of the cropland as forest in the plain area. Our results do not show such a misclassification. The main reason for this type of misclassification is that there are some spectral differences between different tiles of Landsat data. When we employed the self-adaptive RF classification model, the classification features are mainly phenological features, and the training data in 3 × 3 adjacent tiles are used for model training; this effectively reduces the impact of spectral differences. The self-adaptive RF model can be effectively applied because we have a sufficient amount of training data.

  • LULC classification over large areas and a long study period is challenging due to the large volume of data pre-processing required and the cost and difficulty of collecting representative training data that enable classification models to be both globally consistent and locally reliable (Millard and Richardson, 2015; Mahdianpari et al., 2020; Liu et al., 2021). With ready-to-use data, such as those from Landsat products, and the employment of non-parametric classifiers, the major challenge is training data collection (Gumma et al., 2020).

    The detailed results illustrated in Fig.s 6–8 are representative of the results across the study area. In general, our classification results are similar to other LULC products, which indicated that our results are reasonable. The high overall accuracies and generally high F1-score in the main classes (that quantify the level of agreement between the classification and the training pool data) underscore the utility of the self-adaptive RF classification approach. However, the effectiveness of the self-adaptive RF classification method depends upon the availability of sufficient local training data. In this study, this was not an issue as the training pool data derived from multi-sources LULC products were geographically well distributed.

    In this study, a novel method was used to classify Landsat data using high quality training data derived from multi-source LULC products. A training data pool was extracted from the multi-source LULC products by judicious quality and k-means clustering filtering. In addition, the training data selection was undertaken in a geographically systematic manner while ensuring that the selected class ratios were the same as the YRD ground object’s ratios. This is advantageous as it: 1) enables the classification to be undertaken in an automated manner without the need for manual training data collection, 2) provides a large geographically distributed training data set, and 3) results in the generation of a 30 m Landsat LULC product with the same classification legend as the multi-source LULC products.

    Although our method of generating training data has many advantages, it also has certain shortcomings. Firstly, due to the limitation of multi-source LULC products, the training data generated a large number of accurate sample points for contiguous LULC types but fewer for the small LULC types (such as built-up lands in rural areas). Secondly, different LULC products have different classification systems, resulting in fewer LULC classes in the training data generated in our study. Finally, for obtaining multi-source LULC products of high time consistency and data quality, we do not use 30 m LULC products after 2015 because there is no free 30 m LULC product at a global scale after 2015 when we finished the experiment. Therefore, the period of our research is selected from 1992 to 2015. In future work, we will consider whether the Landsat data has changed and generate sample data with more LULC classes and a longer period.

  • Based on the four types of LULC products (ESA-CCI300, MOD12Q1, Globland30, and From-GLC), a data training pool was extracted by judicious quality and k-means clustering filtering. A self-adaptive RF classification was employed to classify 9368 Landsat image tiles. Using these methods, we constructed a map set of LULC in the YRD region from 1992 to 2015 and analyzed the spatiotemporal changes in the main LULC types in the YRD during this period. The primary conclusions of this study are as follows:

    (1) The results indicated that the self-adaptive RF classifier can be used successfully to classify Landsat time series into LULC data, with an average overall classification accuracy of 86.33%. LULC classification results with 30 resolution of YRD appeared geographically reasonable and were similar in comparison to other LULC products. Therefore, the approach is suitable for analyzing changes in LULC.

    (2) The training sample generation approach from multi-source LULC products described in this study can provide a large number of sample points in an acceptable quality level. This is because k-means clustering filtering can effectively eliminate non-homogeneous sample points, so that the generated sample points have higher accuracy.

    (3) The construction of feature space is also an important advantage of this study. The feature space can eliminate the negative impacts of clouds and rain to the time series data. Moreover, it can reduce the difference between different Landsat titles. Therefore, the construction of feature space is essential for time series classification in a large cloudy and rainy area (such as the YRD).

    In future work, we will continue to study the use of multi-source LULC products to generate sample points. We will consider applying some spectral unmixed models to the sample point generation process, to obtain more sample points of non-dominant LULC types (e.g., built-up land in rural areas). In the meanwhile, we consider extending our research workflow from level 1 to level 2, so that we can determine whether our workflow performs the same way with more LULC types.

Reference (55)

Catalog

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return