Towards Ethical and Transparent Mobility Data Products: Estimation of Road traffic Metrics from Publicly Available Camera Feeds

Human mobility reflects important characteristics of human behavior, which serves as a critical moderator among the social, economic, and environmental systems of cities. To understand mobility and its implications, reliable mobility data products are required, among which road traffic metrics such as traffic density, flow, and speed are often needed. Today, road traffic metrics have been widely applied in various areas, including routing, commuting, transportation planning, as well as the study of accessibility, traffic emissions, and social/environmental justice.

As stated by statistician George Box, “all models are wrong.”[1] Data, an artifact of models and algorithms, unarguably have their imperfections. In the production of mobility data (especially big data), errors, inconsistencies, and biases are likely to be introduced by the algorithms involved in the data lifecycle. To ensure that ethical decisions can be made when applying these data for other purposes, the transparency in the lifecycle of mobility data needs to be highlighted. The production of mobility data should be observable by human subjects so that the uncertainties of data can be inspected and audited, and the consequences of data applications can be examined and fully revealed.

The current focus of our work is to facilitate the transparent and ethical production of mobility data, or more specifically, road traffic metrics. The publicly available traffic camera data are used to implement this study. Today, traffic cameras have rapidly emerged as a primary data source for transportation management and control, particularly in the United States and Canada. Real-time images from these cameras are typically available to the public through States’ Department of Transportation, which are free from the restrictions to access, distribute, and create derivative works from the data. The openness of traffic camera data makes them well suit the purpose of this work.

Figure 1. Traffic cameras and images in Central Ohio obtainable from OHGO.

The quality of the road traffic metrics derived from the camera feeds is highly dependent on the accuracy of the identified vehicles. While many effective vehicle detection methods (e.g., YOLOv4, RetinaNet) have been developed recently, their accuracy varies significantly among different camera configurations and environments, especially for vehicles appearing small on the image. In this work, a quadtree-based algorithm is developed to continuously partition the image extent until only regions with high detection accuracy are remained. These regions are referred to as the high-accuracy identification regions (HAIR), which are then used to derive reliable road traffic metrics such as traffic density.

While the use of HAIR can significantly enhance the estimation of road traffic metrics, it also brings errors because of using unrepresentative image inputs or a low level of partitioning. We explicitly present such errors by introducing an accuracy measure called the regional average precision, which helps to actively inform the end users of the potential uncertainty of the data. The data sources, algorithm, code, and the derived data products are made accessible to the public through an online computational platform called CyberGIS-Jupyter. This will not only facilitate replicating the data production in different areas, but also enable the in-depth examination of how uncertainty in the data may have unintentional societal impacts.

[1] Box, G. E. (1976). Science and statistics. Journal of the American Statistical Association, 71(356), 791-799.

 

Yue Lin, PhD Student

The Department of Geography

The Ohio State University

 

2 thoughts on “Towards Ethical and Transparent Mobility Data Products: Estimation of Road traffic Metrics from Publicly Available Camera Feeds

  1. I’m curious why the ethical questions are bigger for mobility data than presumably other data sources – is it simply because we can identify individuals, versus census data where aggregates are reported? Is mobility data necessarily more “private”? How or why does it matter if it’s private industry generating the technology rather than government agencies? What do you think about the development of new measurement technologies, possibly in advance of the social science understanding to guide what’s appropriate (or not)?

    • These are very good points! Both census data and mobility data are problematic in a sense that nobody knows what’s going on in the data production. Mobility data are not more important or bigger than census data, but we will be focusing on mobility data because they are emerging rapidly in recent years, and there still lack dicussions about how to produce ethical data products from sensitive human traces. But I do think that at some point it would be helpful to think of these spatial and “personal” data as a whole.

      I think a major difference in terms of generating and applying new technologies between government agencies and private industry is that for private industry, it is much more difficult to request them to make the algorithms/models used in the data production totally open so that people know what they are doing. The algorithms/models are proprietary since the companies need to keep them confidential to be competitive. It is difficult to balance the interests here.

      I think it is natural for people not to trust some new technologies when they come into use. Transparency is important to build the trust because those who are involved as data subjects or other agents have the right to be informed what the consequences/implications will be.

Leave a Reply

Your email address will not be published. Required fields are marked *