Human mobility reflects important characteristics of human behavior, which serves as a critical moderator among the social, economic, and environmental systems of cities. To understand mobility and its implications, reliable mobility data products are required, among which road traffic metrics such as traffic density, flow, and speed are often needed. Today, road traffic metrics have been widely applied in various areas, including routing, commuting, transportation planning, as well as the study of accessibility, traffic emissions, and social/environmental justice.
As stated by statistician George Box, “all models are wrong.”[1] Data, an artifact of models and algorithms, unarguably have their imperfections. In the production of mobility data (especially big data), errors, inconsistencies, and biases are likely to be introduced by the algorithms involved in the data lifecycle. To ensure that ethical decisions can be made when applying these data for other purposes, the transparency in the lifecycle of mobility data needs to be highlighted. The production of mobility data should be observable by human subjects so that the uncertainties of data can be inspected and audited, and the consequences of data applications can be examined and fully revealed.
The current focus of our work is to facilitate the transparent and ethical production of mobility data, or more specifically, road traffic metrics. The publicly available traffic camera data are used to implement this study. Today, traffic cameras have rapidly emerged as a primary data source for transportation management and control, particularly in the United States and Canada. Real-time images from these cameras are typically available to the public through States’ Department of Transportation, which are free from the restrictions to access, distribute, and create derivative works from the data. The openness of traffic camera data makes them well suit the purpose of this work.
The quality of the road traffic metrics derived from the camera feeds is highly dependent on the accuracy of the identified vehicles. While many effective vehicle detection methods (e.g., YOLOv4, RetinaNet) have been developed recently, their accuracy varies significantly among different camera configurations and environments, especially for vehicles appearing small on the image. In this work, a quadtree-based algorithm is developed to continuously partition the image extent until only regions with high detection accuracy are remained. These regions are referred to as the high-accuracy identification regions (HAIR), which are then used to derive reliable road traffic metrics such as traffic density.
While the use of HAIR can significantly enhance the estimation of road traffic metrics, it also brings errors because of using unrepresentative image inputs or a low level of partitioning. We explicitly present such errors by introducing an accuracy measure called the regional average precision, which helps to actively inform the end users of the potential uncertainty of the data. The data sources, algorithm, code, and the derived data products are made accessible to the public through an online computational platform called CyberGIS-Jupyter. This will not only facilitate replicating the data production in different areas, but also enable the in-depth examination of how uncertainty in the data may have unintentional societal impacts.
[1] Box, G. E. (1976). Science and statistics. Journal of the American Statistical Association, 71(356), 791-799.