-
The framework of the community life circle delineation method based on the spatiotemporal behavioral demand estimation is shown in Fig. 1. This method can be applied to communities for which GPS data are or are not available.
Figure 1. Framework of the community life circle delineation method based on spatiotemporal behavioral demand estimation
First, the life circles were delineated according to the existing CCG activity space method in communities with the GPS data. This method included two stages: 1) delineating the 15-min walking-accessible range by using the CCG method based on the environmental contexts and population composition, and 2) delineating the internal structure by conducting a kernel density analysis based mainly on the behavioral data.
Second, the spatiotemporal demand estimation models were constructed by analyzing the relationships between the behavioral demand reflected by the internal structure, environmental contexts and population composition. The key challenge in ensuring the applicability of the delineation method is the high cost of obtaining the behavioral data. The internal structure, the identification of which depends significantly on the behavioral data, reflects the spatiotemporal behavioral demand for different areas within the walking-accessible range. Therefore, estimating the behavioral demand and thereby delineating the internal structure by using low-cost data, such as environmental contexts and population composition, can help delineate the life circles in communities without behavioral data.
To this end, machine learning techniques were applied to construct the demand estimation models. Machine learning has received increasing attention from researchers in urban and behavioral studies, because of its high estimation performance and the relaxation of the assumptions in traditional regressions. In transportation behavior research, machine learning techniques have already been applied for different purposes. For example, decision trees have been used to model the individual spatiotemporal behavior and to derive the behavior decision rules from activity travel data (Arentze et al., 2000; Arentze and Timmermans, 2004; Sammour and Vanhoof, 2018). Regional travel demand has also been modeled and predicted using data mining techniques (Ghasri et al., 2017). Other applications include the modeling of travel mode choices (Tang et al., 2015; Hagenauer and Helbich, 2017; Wang and Ross, 2018), modeling of walking route choices (Tribby et al., 2017), and inferring trip purposes by combining smart card data and activity travel diary data (Alsger et al., 2018). However, machine learning is still underrepresented in the research of spatiotemporal behavioral demand modeling. This study can contribute to the existing research in this regard.
Finally, the life circles in the communities without GPS data were delineated by combining the CCG method and the spatiotemporal behavioral demand estimation models obtained from machine learning. The CCG method was used to delineate the 15-min walking-accessible range based on the environmental contexts and population composition, and the demand estimation model was applied to identify the internal structure within this range. This delineation method required data that could be acquired at a low cost, and thus, the generalizability was ensured.
-
The key task in this work was the construction of the spatiotemporal behavioral demand estimation models according to the framework of the delineation method. The construction procedure is defined in Fig. 2. Four stages were implemented to obtain the estimation models: identifying the spatiotemporal behavioral demand of the residents, selecting the explanatory variables, applying the machine learning techniques and choosing the final models with the lowest error rates.
First, the identified internal structures of the communities with GPS data were treated as the residents’ spatiotemporal behavioral demand for different plots of land within the 15-min walking-accessible range. The plot of land was used as the basic analysis unit because it is also the basic object in urban planning. Considering that residents may have different demand patterns for different land use types, demand estimation models for five types of land use, namely, public service, commercial service, green land, residential land and other types were developed, according to the code for the classification of land use for urban and rural planning of Beijing (DB 11/996−2013). The other types consisted of industrial land, utility land and other land types that were unrelated to the function of the community life circle. The number of GPS points per capita on each plot served as the spatiotemporal behavioral demand for the plot (Table 1). In general, the absolute number of GPS points is meaningless because the value is affected by the sample size and duration of behavioral surveys. Therefore, the demand for each plot was defined as high or low according to the median value of this number. The plots with a number of GPS points greater than or equal to and fewer than the median corresponded to ‘high demand’ and ‘low demand’, respectively.
Table 1. The variables for construction of the spatiotemporal behavioral demand estimation models
Variables Description Dependent variable Spatiotemporal behavioral demand The total number of the non-work non-travel GPS points outside the home on the plot divided by the
sample size of the community, classified as high or low demand according to the median valueExplanatory variables Distance Distance to the community center of each plot Area Area of each plot Age Proportion of different age groups, including children (0–14), young adults (15–29),
the middle-aged group (30–49) and the elderly group ($ \ge$50)Education Proportion of residents with education levels below high school (ref. level) and high school and above Hukou Proportion of local residents with hukou and migrants without hukou (ref. level) Public facility density Number of public facilities divided by the plot area Commercial facility density Number of commercial facilities divided by the plot area Diversity Simpson diversity index of Point of interest (POIs) (Comer and Greene, 2015) Transit accessibility Distance to the nearest bus stop Else variable Land use type Types of land use, including public service, commercial service, green land, residential land and other types Second, explanatory variables were selected to estimate the spatiotemporal behavioral demand, including the land characteristics, community population composition and built environment. Two aspects were considered when selecting these variables. First, the acquisition cost of the variables was required to be low; for example, an existing database was available for the variables, which guaranteed the generalizability of the delineation method. Second, the explanatory variables were required to be correlated with the behavioral demand. In previous studies, the researchers considered the distance, demographic variables such as age and education, as well as the built environment when constructing the behavioral models (Tang et al., 2015; Ghasri et al., 2017; Hagenauer and Helbich, 2017). Considering these aspects, nine explanatory variables belonging to three categories were selected (Table 1). Among these variables, the distance and area of each plot represented the basic land characteristics. A larger distance of the plot from the center of the community and a smaller plot area corresponded to a lower spatiotemporal behavioral demand for the plot. The age, education level and hukou status structures represented the different needs of the different residents for the land. In general, the household registration (hukou) status has been noted to be a key socio-economic variable that also influences the travel behavior (Li and Liu, 2016; Zhang et al., 2018). The introduction of the sociodemographic composition into the estimation models can ensure that the delineated life circle can represent the community characteristics. In terms of the built environment, the density of the public and commercial service facilities, diversity of points of interest (POIs) and transit accessibility were considered. These variables were noted to be closely related with the spatiotemporal behavior (Ewing and Cervero, 2010; Hagenauer and Helbich, 2017).
Third, different machine learning techniques as well as logistic regression were applied to model the behavioral demand for different types of plots. Logistic regression, which is the most widely used analytical model for behavioral decisions, served as the basic model to demonstrate the strengths of the machine learning techniques. In this work, decision tree and tree-based ensemble learning were applied as the machine learning techniques. Tree-based learning methods have been applied for many purposes, for instance, to analyze the mode choice, trip purpose, travel destination and spatiotemporal behavioral decisions (Arentze and Timmermans, 2004; Hagenauer and Helbich, 2017; Ghasri et al., 2017). These methods generally exhibit a higher performance and efficiency than those of logit models and artificial neural networks (Xie et al., 2003, Tribby et al., 2017).
Decision trees utilize a tree-like structure for data classification. Starting with the root, each node recursively splits the data by features, and the leaves represent the classes. ID3 and classification and regression trees (CART) are the most widely used decision tree induction algorithms. ID3 uses the entropy measure to choose the attribute at each node. C4.5 improves upon the ID3 algorithm (Quinlan, 1993), and C5.0 further improves upon the C4.5. Therefore, in this research, C5.0 was employed. The CART algorithm, which is based on the Gini index, measures the purity of a response distribution and evaluates the splits (Breiman et al., 1984). Although decision trees can effectively manage nonlinear relationships, a single tree is sensitive to noise and tends to overfit (Hagenauer and Helbich, 2017). Tree-based ensemble techniques combine many decision trees to obtain a higher predictive performance than that of any single classifier. Bagging trains classifiers in a parallel manner by using bootstrap samples. Each classifier has an equal weight, and the majority vote determines the class assignment in the prediction (Breiman, 1996). The random forest (RF) algorithm is similar to bagging. Although the random forest also trains classifiers using bootstrap samples, the nodes of the trees are determined by a random subset of variables (Breiman, 2001). Boosting is different from the bagging and random forest techniques in that it trains classifiers successively, with a new classifier established to improve the incorrect classifications in the preceding classifiers. The prediction is based on weighted voting (Freund and Schapire, 1997). In this work, the adaptive boosting (AdaBoost) technique was applied, which is the most commonly implemented type of boosting.
All the techniques were implemented for each type of land use in the R programming environment (R Core Team, 2019). The relevant packages for this research were ‘raprt’ (Therneau et al., 2019), ‘C50’ (Kuhn et al., 2020), ‘adabag’ (Alfaro et al., 2013) and ‘randomForest’ (Liaw and Wiener, 2002). The performance of each model was estimated using 10-fold cross-validation, to reduce the bias in selecting the training and testing subsets (Kohavi, 1995). The parameters of the number of trees in the ensemble and the number of variables randomly sampled at each split in the random forest were defined by trial and error, and the parameters of the models that produced the lowest error rate were chosen as the final parameters. The other parameters were assigned default values.
Finally, for each type of land use, the model with the lowest error rate was chosen as the final estimation model. For the final models, the variable importance (Ⅵ) was determined through the algorithms to examine the effectiveness of the selected explanatory variables. The mean decrease in the Gini index of each variable was used as the Ⅵ measure (Ghasri et al., 2017). Although more effective approaches are available to define the Ⅵ, the average reduction in the Gini index was used in this research because the Ⅵ among the different ensemble learning models is comparable.
Delineation of an Urban Community Life Circle Based on a Machine-Learning Estimation of Spatiotemporal Behavioral Demand
-
Abstract: Delineating life circles is an essential prerequisite for urban community life circle planning. Recent studies combined the environmental contexts with residents’ global positioning system (GPS) data to delineate the life circles. This method, however, is constrained by GPS data, and it can only be applied in the GPS surveyed communities. To address this limitation, this study developed a generalizable delineation method without the constraint of behavioral data. According to previous research, the community life circle consists of the walking-accessible range and internal structure. The core task to develop the generalizable method was to estimate the spatiotemporal behavioral demand for each plot of land to acquire the internal structure of the life circle, as the range can be delineated primarily based on environmental data. Therefore, behavioral demand estimation models were established through logistic regression and machine learning techniques, including decision trees and ensemble learning. The model with the lowest error rate was chosen as the final estimation model for each type of land. Finally, we used a community without GPS data as an example to demonstrate the effectiveness of the estimation models and delineation method. This article extends the existing literature by introducing spatiotemporal behavioral demand estimation models, which learn the relationships between environmental contexts, population composition and the existing delineated results based on GPS data to delineate the internal structure of the community life circle without employing behavioral data. Furthermore, the proposed method and delineation results also contributes to facilities adjustments and location selections in life circle planning, people-oriented transformation in urban planning, and activity space estimation of the population in evaluating and improving the urban policies.
-
Figure 3. Survery area: a) location of Qinghe Sub-district, Beijing; b) locations of the surveyed communities in Qinghe sub-district, Beijing. ANBL: Anning Beilu; ANDL: Anning Donglu; ANL: Anningli; DDJY: Dangdai Chenshi Jiayuan; HQY: Haiqingyuan; LDJY: Lidu Jiayuan; LXGG: Lingxiu Guigu; MFN: Maofangnan; MHY: Meiheyuan; MKY: Mingkeyuan; QSY: Qingshangyuan; XFS: Xuefushu Jiayuan; YMJY: Yimei Jiayuan; ZXY: Zhixueyuan
Table 1. The variables for construction of the spatiotemporal behavioral demand estimation models
Variables Description Dependent variable Spatiotemporal behavioral demand The total number of the non-work non-travel GPS points outside the home on the plot divided by the
sample size of the community, classified as high or low demand according to the median valueExplanatory variables Distance Distance to the community center of each plot Area Area of each plot Age Proportion of different age groups, including children (0–14), young adults (15–29),
the middle-aged group (30–49) and the elderly group ($ \ge$ 50)Education Proportion of residents with education levels below high school (ref. level) and high school and above Hukou Proportion of local residents with hukou and migrants without hukou (ref. level) Public facility density Number of public facilities divided by the plot area Commercial facility density Number of commercial facilities divided by the plot area Diversity Simpson diversity index of Point of interest (POIs) (Comer and Greene, 2015) Transit accessibility Distance to the nearest bus stop Else variable Land use type Types of land use, including public service, commercial service, green land, residential land and other types -
[1] Ahas R, Aasa A, Yuan Y et al., 2015. Everyday space–time geographies: using mobile phone-based sensor data to monitor urban activity in Harbin, Paris, and Tallinn. International Journal of Geographical Information Science, 29(11): 2017–2039. doi: 10.1080/13658816.2015.1063151 [2] Alfaro E, Gamez M, García N, 2013. Adabag: an R package for classification with boosting and bagging. Journal of Statistical Software, 54(2): 1–35. doi: 10.18637/jss.v054.i02 [3] Alsger A, Tavassoli A, Mesbah M et al., 2018. Public transport trip purpose inference using smart card fare data. Transportation Research Part C: Emerging Technologies, 87: 123–137. doi: 10.1016/j.trc.2017.12.016 [4] Arentze T A, Hofman F, Van Mourik H et al., 2000. Using decision tree induction systems for modeling space-time behavior. Geographical Analysis, 32(4): 330–350. doi: 10.1111/j.1538-4632.2000.tb00431.x [5] Arentze T A, Timmermans H J P, 2004. A learning-based transportation oriented simulation system. Transportation Research Part B: Methodological, 38(7): 613–633. doi: 10.1016/j.trb.2002.10.001 [6] Breiman L, Friedman J, Stone C J et al., 1984. Classification and Regression Trees. London: Chapman and Hall/CRC. [7] Breiman L, 1996. Bagging predictors. Machine Learning, 24(2): 123–140. doi: 10.1023/A:1018054314350 [8] Breiman L, 2001. Random forests. Machine Learning, 45(1): 5–32. doi: 10.1023/A:1010933404324 [9] Chai Y W, 2014. From socialist danwei to new danwei: a daily-life-based framework for sustainable development in urban China. Asian Geographer, 31(2): 183–190. doi: 10.1080/10225706.2014.942948 [10] Chai Yanwei, Zhang Xue, Sun Daosheng, 2015. A study on life circle planning based on space time behavioural analysis: a case study of Beijing. Urban Planning Forum, (3): 61–69. (in Chinese) [11] Chai Yanwei, Li Chunjiang, 2019. Urban life cycle planning: from research to practice. City Planning Review, 43(5): 9–16, 60. (in Chinese) [12] Chai Yanwei, Li Chunjiang, Xia Wanqu et al., 2019. Study on the delineation model of urban community life circle: based on qinghe district in Haidian district, Beijing. Urban Development Studies, 26(9): 1–8, 68. (in Chinese) [13] Comer D, Greene J S, 2015. The development and application of a land use diversity index for Oklahoma City, OK. Applied Geography, 60: 46–57. doi: 10.1016/j.apgeog.2015.02.015 [14] Cui Zhenzhen, Huang Xiaochun, He Lianna et al., 2016. Study on urban life convenience index based on POI data. Geomatics World, 23(3): 27–33. (in Chinese) [15] Douglass M, Wissink B, Van Kempen R, 2012. Enclave urbanism in China: consequences and interpretations. Urban Geography, 33(2): 167–182. doi: 10.2747/0272-3638.33.2.167 [16] Ewing R, Cervero R, 2010. Travel and the built environment: a meta-analysis. Journal of the American Planning Association, 76(3): 265–294. doi: 10.1080/01944361003766766 [17] Freund Y, Schapire R E, 1997. A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences, 55(1): 119–139. doi: 10.1006/jcss.1997.1504 [18] Ghasri M, Rashidi T H, Waller S T, 2017. Developing a disaggregate travel demand system of models using data mining techniques. Transportation Research Part A: Policy and Practice, 105: 138–153. doi: 10.1016/j.tra.2017.08.020 [19] Guo Rong, Li Yuan, Huang Mengshi, 2019. Research on optimization strategy of walking network in 15-minute community life circle of Harbin. Planners, 35(4): 18–24. (in Chinese) [20] Hagenauer J, Helbich M, 2017. A comparative study of machine learning classifiers for modeling travel mode choice. Expert Systems with Applications, 78: 273–282. doi: 10.1016/j.eswa.2017.01.057 [21] Han Zenglin, Li Yuan, Liu Tianbao et al., 2019. Spatial differentiation of public service facilities’ configuration in community life circle: a case study of Shahekou district in Dalian city. Progress in Geography, 38(11): 1701–1711. (in Chinese). doi: 10.18306/dlkxjz.2019.11.006 [22] Kohavi R, 1995. A study of cross-validation and bootstrap for accuracy estimation and model selection. In: Proceedings of the 14th International Joint Conference on Artificial Intelligence. San Francisco: ACM, 1137–1145. [23] Kuhn M, Weston S, Coulter N et al., 2020. C50: C5.0 Decision Trees and Rule-Based Models. R package version 0.1.3.1. Available at: https://cran.r-project.org/web/packages/C50/C50.pdf [24] Kwan M P, 2007. Mobile communications, social networks, and urban travel: hypertext as a new metaphor for conceptualizing spatial interaction. The Professional Geographer, 59(4): 434–446. doi: 10.1111/j.1467-9272.2007.00633.x [25] Li S M, Liu Y, 2016. The jobs-housing relationship and commuting in Guangzhou, China: Hukou and dual structure. Journal of Transport Geography, 54: 286–294. doi: 10.1016/j.jtrangeo.2016.06.014 [26] Liaw A, Wiener M, 2002. Classification and regression by randomForest. R News, 2(3): 18–22. [27] Liu T B, Chai Y W, 2015. Daily life circle reconstruction: a scheme for sustainable development in urban China. Habitat International, 50: 250–260. doi: 10.1016/j.habitatint.2015.08.038 [28] Loo B P Y, Wang B, 2018. Factors associated with home-based e-working and e-shopping in Nanjing, China. Transportation, 45(2): 365–384. doi: 10.1007/s11116-017-9792-0 [29] Ministry of Housing and Urban-Rural Development of the People’s Republic of China, 2018. Standard for urban residential area planning and design. http://www.mohurd.gov.cn/wjfb/201811/t20181130_238590.html. Cited 15 January 2020. (in Chinese) [30] Municipal Bureau of Planning and Natural Resources of Shanghai, 2016. Shanghai planning guidance of 15-minute community life circle. http://ghzyj.sh.gov.cn/zcfg/ghss/201609/t20160902_693401.html. Cited 15 January 2020. (in Chinese) [31] Municipal Bureau of Planning and Natural Resources of Ji’nan, 2019. Jinnan planning guidance of 15-minute community. http://jnup.jinan.gov.cn/art/2019/1/31/art_10231_2824958.html. Cited 15 January 2020. (in Chinese) [32] Perchoux C, Chaix B, Cummins S et al., 2013. Conceptualization and measurement of environmental exposure in epidemiology: accounting for activity space related to daily mobility. Health & Place, 21: 86–93. doi: 10.1016/j.healthplace.2013.01.005 [33] Quinlan J R, 1993. C4.5: programs for Machine Learning. San Mateo: Morgan Kaufmann Publishers. [34] R Core Team, 2019. R: a Language and Environment for Statistical Computing. Vienna, Austria: R for Statistical Compu ting. [35] Rainham D, McDowell I, Krewski D et al., 2010. Conceptualizing the healthscape: contributions of time geography, location technologies and spatial ecology to place and health research. Social Science & Medicine, 70(5): 668–676. doi: 10.1016/j.socscimed.2009.10.035 [36] Sammour G, Vanhoof K, 2018. A validation measure for computational scheduler activity-based transportation models based on sequence alignment methods. Transportation Planning and Technology, 41(7): 736–751. doi: 10.1080/03081060.2018.1504183 [37] Schwanen T, Kwan M P, 2008. The internet, mobile phone and space-time constraints. Geoforum, 39(3): 1362–1377. doi: 10.1016/j.geoforum.2007.11.005 [38] Sharp G, Denney J T, Kimbro R T, 2015. Multiple contexts of exposure: activity spaces, residential neighborhoods, and self-rated health. Social Science & Medicine, 146: 204–213. doi: 10.1016/j.socscimed.2015.10.040 [39] Sun Daosheng, Chai Yanwei, Zhang Yan, 2016. The definition and measurement of community life circle: a case study of Qinghe area in Beijing. Urban Development Studies, 23(9): 1–9. (in Chinese) [40] Sun Daosheng, Chai Yanwei, 2017. Study on the urban community life sphere system and the optimization of public service facilities: a case study of Qinghe area in Beijing. Urban Development Studies, 24(9): 7–14, 25. (in Chinese) [41] Tang L, Xiong C F, Zhang L, 2015. Decision tree method for modeling travel mode switching in a dynamic behavioral process. Transportation Planning and Technology, 38(8): 833–850. doi: 10.1080/03081060.2015.1079385 [42] The Central Committee of the Communist Party of China and the State Council of China, 2014. The National New-type Urbanism Plan. http://www.gov.cn/gongbao/content/2014/content_2644805.htm, cited 15 January 2020. (in Chinese) [43] Therneau T, Atkinson B, Ripley B, 2019. Rpart: Recursive Partitioning for Classification. R package version 4.1–15. Available at: https://repo.bppt.go.id/cran/web/packages/rpart/rpart.pdf [44] Thulin E, Vilhelmson B, Schwanen T, 2020. Absent friends? Smartphones, mediated presence, and the recoupling of online social contact in everyday life. Annals of the American Association of Geographers, 110(1): 166–183. doi: 10.1080/24694452.2019.1629868 [45] Tribby C P, Miller H J, Brown B B et al., 2017. Analyzing walking route choice through built environments using random forests and discrete choice techniques. Environment and Planning B: Urban Analytics and City Science, 44(6): 1145–1167. doi: 10.1177/0265813516659286 [46] Wang Bo, Zhen Feng, Wei Zongcai et al., 2015. A theoretical framework and methodology for urban activity spatial structure in e-society: empirical evidence for Nanjing City, China. Chinese Geographical Science, 25(6): 672–683. doi: 10.1007/s11769-015-0751-4 [47] Wang F R, Ross C L, 2018. Machine learning travel mode choices: comparing the performance of an extreme gradient boosting model with a multinomial logit model. Transportation Research Record: Journal of the Transportation Research Board, 2672(47): 35–45. doi: 10.1177/0361198118773556 [48] Wang J, Kwan M P, Chai Y W, 2018. An innovative context-based crystal-growth activity space method for environmental exposure assessment: a study using GIS and GPS trajectory data collected in Chicago. International Journal of Environmental Research and Public Health, 15(4): 703. doi: 10.3390/ijerph15040703 [49] Wu Qiuqing, 2015. The exploration on the dynamic programming of community in megacities from the living circle perspective. Shanghai Urban Planning Review, (4): 13–19. (in Chinese) [50] Xi G, Zhen F, Cao X et al., 2020b. The interaction between e-shopping and store shopping: empirical evidence from Nanjing, China. Transportation Letters, 12(3): 157–165. doi: 10.1080/19427867.2018.1546797 [51] Xi G L, Cao X Y, Zhen F, 2020a. The impacts of same day delivery online shopping on local store shopping in Nanjing, China. Transportation Research Part A: Policy and Practice, 136: 35–47. doi: 10.1016/j.tra.2020.03.030 [52] Xiao Jinghao, Zhou Dailin, Hu Jiapei, 2018. Measurement and evaluation method of community life-cycle based on decision tree theory: panyu district of Guangzhou. Planners, 34(3): 91–96. (in Chinese) [53] Xie C, Lu J Y, Parkany E, 2003. Work travel mode choice modeling with data mining: decision trees and neural networks. Transportation Research Record: Journal of the Transportation Research Board, 1854(1): 50–61. doi: 10.3141/1854-06 [54] Xu Xiaoyan, Ye Peng, 2010. On the relationship between self-sufficiency and location of urban community facilities. Urban Problems, (3): 62–66. (in Chinese) [55] Yu Yifan, 2019. From traditional residential area planning to neighborhood life circle planning. City Planning Review, 43(5): 17–22. (in Chinese) [56] Zhang M Z, He S J, Zhao P J, 2018. Revisiting inequalities in the commuting burden: institutional constraints and job-housing relationships in Beijing. Journal of Transport Geography, 71: 58–71. doi: 10.1016/j.jtrangeo.2018.06.024 [57] Zhen F, Cao Y, Qin X et al., 2017. Delineation of an urban agglomeration boundary based on Sina Weibo microblog ‘check-in’ data: a case study of the Yangtze River Delta. Cities, 60: 180–191. doi: 10.1016/j.cities.2016.08.014