Data Science

Data science uses scientific methods, processes, algorithms and systems to extract knowledge and insights from diverse data, including semantic web data and structured and unstructured forms of “big data.” It deploys various techniques, from machine learning and statistics to natural language processing and linear algebra. Data science is at the heart of advances in numerous domains, including business analytics, medical intelligence and intelligent navigation.


ST-Elevation Myocardial Infarction: Left and Right of the Boom

Partner: Abbott Northwestern Hospital - Allina Health
In collaboration with medical practitioners, the project seeks to identify hidden patterns and relationships in the practice of cardiology, based on machine learning approaches applied to a registry of 5,000+ patients.

Transnational Partnership for Excellent Research and Education in Big Data and Emergency Management — Norway Research Council

Partners: Western Norway Research Institute, University of Bergen (Norway), National Institute of Informatics (Japan), Hong Kong Polytechnic University, University of Tokyo, George Mason University, Illinois Institute of Technology
The project is developing innovative curricular content and original research in response to public and private sector needs. The choice of CICS Leave geography site as one of three U.S. partners in this global initiative reflects its international reputation in the big data and emergency management domains.

Modeling the Socio-Spatial Network of Wild Pigs

Partner: Tejon Ranch Conservancy
Wild pigs cause annual economic damages of more than $1 billion in the United States. Effective population control requires improved understanding of their spatio-temporal and social patterns. To that end, CICS Leave geography site researchers developed a novel set of systematic network-type conceptualizations and implemented an interactive analytics dashboard to allow wildlife specialists to improve their understanding of social and spatial interactions among the wild pig population.

Expert Versus Data-driven Independent Variable Selection

A combination of machine learning, clustering and regression approaches was applied to 900+ hydrologic basins across the United States. In the search for making improved predictions for ungauged basins, the project investigated how the number and information content of independent variables affected model performance, and compared data-driven versus expert assessment approaches for variable selection.

Geographic Information Science and Technology Body of Knowledge — National Science Foundation

Partners: CUNY - Hunter College, New Mexico State University, Brigham Young University
A computational framework was developed for re-engineering the GIS&T Body of Knowledge (GIS&T BoK). It combines the wikification of the BoK editing process, the transformation of the previous concept hierarchy into a semantic network, an expansion of the ontology vocabulary, and a series of web services to deliver semantic web content to user applications.

Gridded Time Series Analysis of Snow Dynamics in the Northern Hemisphere

The applicability of a popular neural network technique for climate research was extended beyond its traditional use in multivariate clustering. The visual-analytic potential of the method was significantly enhanced with a series of conceptual, computational and visual transformations, including a novel multivariate trajectory technique.

Accuracy of Models for Mapping the Medical Sciences — National Institutes of Health

Partners: SciTech Strategies Inc., Indiana University, Collexis Holdings Inc
The accuracy of different similarity approaches in the clustering of more than two million biomedical documents was investigated. The project was designed to develop robust answers to the question of which similarity approach would generate the most coherent clusters. Further contributions include the creation of the largest ever neural network model of the biomedical sciences, leveraging supercomputing and parallelization.

big data workflow graphic

Big Data Visualization and Spatiotemporal Modeling of Aggressive Driving Leave geography site

More than half of fatal traffic crashes occur due to aggressive driving according to AAA (American Automobile Association) Foundation for Traffic Safety. Ubiquitous technology has made it possible to monitor driver behavior at a high frequency for a long period of time. This provides an opportunity for researchers to investigate risky driving behavior at a large scale.

Collaborating with the National Safety through Disruption (Safe-D) University Transportation Center (UTC), this project aims to develop a big data analytics framework and visualization tool to conduct spatiotemporal modeling and classify and visualize aggressive driving behavior using data from emerging technology. As an essential safety planning tool in the era of big data, this framework/tool can be used to identify where and when aggressive driving occurs.

smart dashboard for flu outbreak

Integrated Stage-based Evacuation with Social Perception Analysis and Dynamic Population Estimation Leave geography site

The research will help emergency response agencies better understand public perceptions and needs during disaster events, and create more effective evacuation plans for local communities. This project will integrate multiple data sources—including social media, census survey, geographic information systems (GIS) data layers, volunteer suggestions, and remote sensing data—to develop an integrated wildfire evacuation decision support system (IWEDSS) for the County of San Diego as a demonstration prototype system. IWEDSS will consist of four core modules: dynamic population estimation, stage-based robust evacuation models, social perception analysis, and a web-based geospatial analytics platform. It will offer scientifically-based and data-driven analytic tools for evacuation planers, resource managers, and decision makers to support efficient and effective decision-making activities that can reduce the evacuation time and potential number of injuries and deaths. The research team will collaborate with staff from the Office of Emergency Services (OES) of San Diego County, the San Diego/Imperial Counties Chapter of the American Red Cross, and 2-1-1 San Diego to develop IWEDSS together.

Mappingideas from Cyberspace to Realspace Leave geography site

This NSF Funded project seeks to map both the geography and the chronology of ideas over cyberspace, as the ripples of information usage radiate outward from a given event epicenter. By mapping and analyzing such ripples, new insights will be provided into the role of new media in biasing, accelerating, impeding, or otherwise influencing personal, social and political uses of such information.

Patent Filings

“Methods and Systems for Base Map and Inference Mapping”
Inventors: Skupin, A. and Du, F.

“Knowledge Reference System and Method”
Inventors: Skupin, A., Plewe, P., Ahearn, S., and Icke, I.



Computer-assisted map production techniques with emphasis on map design and color use.

Current development of Internet mapping and cartographic skills for web-based maps (multimedia, animation, and interactive design). Fundamental theories of distributed GIS to support Internet mapping with focus on distributed component technologies, Internet map servers, and web services.

Big data science to include analysis, data collection, filtering, GIS, machine learning, processing, text analysis, and visualization. Computational platforms, skills, and tools for conducting big data analytics with real world case studies and examples.

Spatial analytic techniques from image processing, remote sensing, geographic information systems, cartography or quantitative methods. May be repeated with new content. See Class Schedule for specific content.