What data / tools exist for mapping of disease trends?

What data / tools exist for mapping of disease trends?

We are searching data for your request:

Forums and discussions:
Manuals and reference books:
Data from registers:
Wait the end of the search in all databases.
Upon completion, a link will appear to access the found materials.

I was just looking at the Google Flu Trends map. Google Insight could be used to gain similar information for other trends in disease keyword searches.

I started wondering how valid are keyword searches to represent the spread of diseases. In some ways I think it could be a good representative, but in others I think it would be susceptible to influence by mass media.

Do maps of clinical data exist that could be compared to this Google data? If so, where could I find some. =)

You can use official sources such as hospital admissions, prescriptions for drugs fighting the disease you are tracking, sales of over-the-counter medicines. CDC (,, etc), WHO, EuroFlu Weekly Electronic Bulletin map official clinical data, Aurametrix uses these sources.

Several scientific studies have compared GFT, twitter, even facebook with official sources. Johns Hopkins study (pubmed/22230244) used data from local hospitals in addition to CDC. My sampling studies too showed that social trends are surprisingly good, despite all the noise, but provide good estimates only for highly populated areas.

You might also want to check Sickweather's algorithm, healthmap, usgs disease maps and industry databases providing coverage of pharmaceutical companies and product sales.

For myself, I've not been terribly impressed with the use of keyword searches and the like as methods of predicting influenza or other diseases. They are, when it comes down to it, somewhat glorified extensions of a syndromic surveillance system, just for "I feel sick" instead of "Honey, pop down to CVS and pick up some NyQuil?"

It's also apt to get you some odd results. I tried to use a "Google-style" analysis to match some rotavirus data to Google searches - I believe I was 7 entries down before I ever got to anything rotavirus related, and even that was "rotavirus", which doesn't suggest it would be useful as a symptom detector, but rather people Googling a disease their doctor said they had. The top results, some of which fit really well? A series of prom dress-related searches. In order for it to be useful, you'd have to extend Google's system.

But, as a thought experiment, if you'd like to match it to data, there's a number of potential data sources publicly available from the CDC. Morbidity and Mortality Weekly Report for example publishes weekly counts for a number of notifiable diseases.

The Canadian Cancer Data Tool (CCDT) provides comprehensive data on the incidence and mortality of cancer in Canada over time by age and sex in an easy to use, flexible format. Information on 24 different categories of cancer types, including overall cancer, are available. The Public Health Agency of Canada developed this tool using Statistic Canada’s Canadian Cancer Registry (CCR) and Canadian Vital Statistics Death (CVSD) database. Data for the CCR and CVSD.

These 10 medical breakthroughs will change the world

Every day, medical innovations lengthen and improve lives across the globe. Over the course of the next decade, as twenty-first century technologies combine and accelerate, healthcare is set for a revolution.

The GLOBAL INNOVATION INDEX 2019 (GII), a report from the World Intellectual Property Organization and its research partners Cornell University and INSEAD, identifies five global trends driving this transformation: broadband access, developments in artificial intelligence and the human genome, changing business models, and the rise of consumerism.

These trends are leading to breakthroughs across a range of medical frontiers. For this year’s GII, the NIH identified 10 of the cutting-edge emerging technologies most likely to revolutionize healthcare over the next decade.

Have you read?

Here’s a closer look at the technologies that made the list:

1. Single cell analysis
Likely to be one of the first of the 10 breakthroughs to come to fruition, single-cell analysis will allow scientists to study individual cells in their normal environment for the first time. The ability to determine which genes are turned on or off in individual cells, and to decode how immune cells attack healthy tissue, will transform how we approach autoimmune diseases and how we combat the deadly process of cancer metastasis.

2. Mapping the brain
The human brain remains one of science’s most daunting frontiers. The NIH’s Brain Research through Advancing Innovative Neurotechnologies (BRAIN) initiative is accelerating our understanding of this most complex and critical organ. Within a decade, researchers will have mapped the circuits responsible for motor function, vision, memory and emotion. This will lead to new approaches to a raft of neurological disorders including autism, epilepsy, brain injuries, schizophrenia, Parkinson’s, Alzheimer’s and spinal cord injuries.

3. Alzheimer’s Disease
Aided by new imaging techniques developed and optimized by the BRAIN Initiative, NIH research indicates that within a decade we will be able to identify individuals at high risk of Alzheimer’s before symptoms even appear. Early interventions will slow or change the course of the disease, providing profound human and economic benefits.

4. Spinal cord injuries
A decade from now we will have developed effective treatments for spinal cord injuries. Already, ground-breaking research supported by NIH has enabled several young men paralyzed from the waist down to move their legs through the use of surgically implanted electrical stimulators that bypass the severed spinal cord. Soon, many of the millions of people worldwide coping with spinal cord damage could be given back freedom of movement.

5. Pain management
Chronic pain is a serious and costly public health problem affecting tens of millions of people worldwide. Unfortunately, current treatments can be addictive, leading to tragic outcomes. The NIH recently launched the Helping to End Addiction Long-Term (HEAL) initiative, harnessing genomics, neuroscience and structural biology to uncover entirely new targets for treating chronic pain.

6. Regenerative medicine
This exciting field of research looks at ways of replacing or regenerating human tissues and organs when they are damaged. Methods range from stimulating the body’s own repair mechanisms, to growing tissues and organs in the laboratory. In a decade, regenerative medicine could change the course of chronic diseases like diabetes, and eliminate the problems associated with tissue and organ transplants, including sourcing, waiting lists, tissue rejection and the need for anti-rejection drugs.

7. Cancer immunotherapy
This radical new approach enlists the cancer patient’s immune system, with one promising strategy involving collecting immune cells and engineering them to produce special cancer-fighting warriors, called chimeric antigen receptors. This work has already saved the lives of adults and children with untreatable blood cancers, and sights are set on tougher targets including breast, prostate, colon, ovarian and pancreatic cancer.

8. New vaccines
In the next 10 years, important strides will be made in preventing HIV, flu and other infectious diseases. NIH is funding research into a universal flu vaccine that will provide long-lasting protection against a wide range of flu strains. This will prepare us for the next overdue worldwide pandemic, potentially saving millions of lives.

9. Gene editing to cure disease
Scientists have identified the molecular causes of nearly 6,500 human diseases, yet treatments currently exist for only about 500 (see chart below). By 2030, science will have begun to realize the promise of genetic technologies to treat and cure diseases that once seemed out of reach. Gene editing tools like Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR-Cas) allow the correction of gene mutations – with a cure for sickle cell disease being one of its first targets.

10. Precision medicine
Our physiological processes are unique, and in 10 years, medicine will have begun to reflect that reality with diagnosis, treatments and healthcare delivery tailored to each individual. In the US, the NIH-led All of Us Research Program is recruiting 1 million volunteers to pioneer the merging and analysis of wide-ranging data that will help ensure that people from all walks of life, all around the world, will be healthier than ever.

Geographic Information System Data

Place is one of the basic tenets of a field investigation. Both the who and the when of disease are relative to and often dependent on the where. Geographic information science, systems, software (collectively known as GIS) and methods are one of the tools epidemiologists use in defining and evaluating the where. This chapter reviews GIS applications as they pertain to the 10 steps of a field investigation.

Generating Maps for Situational Awareness

Standard mapping techniques will produce informative visualizations and provide orientation for studying the location, the physical attributes of the investigation area, and descriptive characteristics of the population(s) of interest. Field staff should begin by creating general reference maps (1,2). Google Maps (Google, Inc., Mountain View, CA), OpenStreetMap (an open-source wiki software by the OpenStreetMap Foundation), or county geographic files serve as reasonable starting points. These maps can include information about road networks, hotels, airports, and other points of interest to familiarize the field team with the area in which it will be investigating the disease or injury occurrence. Reference maps can be useful in both domestic and international settings, especially in unfamiliar areas.

Additionally, such maps are useful for establishing the boundaries of the investigation area (2&ndash4). Using geographic information science, systems, or software (collectively known as GIS), boundaries can be drawn for the area of interest and from which specific GIS data files, known as shapefiles, can be created (2,4,5). These boundary files can then be used to evaluate variables of interest (e.g., estimating the number of persons residing within a particular area or examining the extent of contamination from a harmful exposure) (Figure 17.1).

Identifying and Acquiring Pertinent Supplemental Data

Time permitting, the field team might consider gathering pertinent data sets useful beyond general reference data. For example, incorporating such sociodemographic characteristic data as population counts, age, sex, race/ethnicity, sensitive populations, language/translation needs, and measures of poverty by specific state, county, or other census boundaries is possible by using US Census data (Figure 17.2). The US Census Bureau makes these data available with a unique geographic identifier, thereby enabling easy association between the population data and the location data in GIS (6).

Understanding the influence of the natural and built environment is possible with GIS (1&ndash3). For example, exploring the distribution of persons in communities and neighborhoods, school locations, childcare facilities, or senior living facilities relative to the locations of industry might prove key to the investigation. During a natural disaster or a chemical release, imagery can be useful for understanding the extent of damage, to track population movements, and to guide the planning and logistics of travel for fieldwork. Furthermore, identifying transportation routes and locations of public utilities might be pertinent to understanding potential transmission modes (Figure 17.3).

GIS can be a principal resource for generating a sampling plan. By using GIS, investigators can select homes or areas within communities for sampling activities (Figure 17.4).

Similarly, investigators can use road network data to develop optimized routes for data collection. Before fieldwork in Panama, for example, researchers used GIS to characterize varying levels of forestation adjacent to villages for study site selection (7). Additionally, the researchers used maps to determine each village&rsquos accessibility.

Depending on the study area&rsquos location (i.e., domestic vs. international), different levels of data might be available. In domestic situations, current and historic data from the US Census Bureau and satellite imagery may be readily available. This might also be true in certain international settings however, obtaining this information before deployment might be difficult. In those instances, investigators might need to rely on dated or minimally detailed information before beginning fieldwork. Under these circumstances, the team should consider collecting pertinent data after arriving at the location.

Selecting GIS Software and Equipment

Both commercial and open-source GIS packages offer useful software options (4,8). Additionally, statistical software packages with spatial analysis capabilities exist. When selecting a GIS package, the user should consider data collection, analysis, and visualization needs, as well as available technical and financial resources, in determining which package is most feasible.

Developing GIS Capacity

The beginning of the investigation is often the best time to collaborate with a GIS subject matter expert (SME) because that person can provide advice regarding pertinent maps, data, and analysis plans. Engaging GIS SMEs from the beginning can also build GIS capacity among the field team. During 2017, for example, a team from the Center for Global Health of the Centers for Disease Control and Prevention (CDC) collaborated with GIS SMEs in the Geospatial Research, Analysis, and Services Program to determine the best methods for collecting, storing, and analyzing locations where sex workers were active in Papua, New Guinea (9). That collaboration resulted in a plan to determine locations to conduct the surveys, to implement methods for collecting location data, and to enable spatial data analysis, which led to development of an interactive mapping tool. After completion, not only were relevant data collected, but the team began to develop internal GIS capacity.

  • General reference maps can provide situational awareness.
  • Maps can be useful for setting the boundaries of the investigation area.
  • Maps can be instrumental in developing a sampling plan.
  • Publicly available data (e.g., US Census Bureau and health outcome data) can be mapped and evaluated for the particular area of interest.
  • Imagery data also might be informative, especially when attempting to assess damage from natural disasters.
  • Inviting GIS SMEs to participate in the planning process can build field team capacity.

The field investigator might begin extracting location information provided directly from laboratory reports. The patient&rsquos residential street address at the time of diagnosis is often collected along with specimens for laboratory testing. Therefore, these data should be readily available when a laboratory or hospital reports its results. If this information is unavailable through laboratory reports or other electronic records, the field team should consider whether location information will be important to the analysis and determine methods for collecting those data.

GIS for Determining Populations at Risk

When determining whether a particular health outcome is occurring at a greater than expected rate, the correct population at risk must be determined. This often involves estimating a population within a specified geographic area. Census data are readily available at varying geographic units (e.g., block, census tract, county, and state) in files easily processed in GIS (6). Similarly, evaluating census data with GIS can assist in identifying a relevant comparison population. Often, these preexisting geopolitical boundaries are sufficient for estimating population characteristics. However, this is not always the case. For example, wind patterns may carry a contaminant to only a portion of a county or across multiple census tracts, creating nonstandard shapes. GIS can calculate the area of interest and be used to estimate the proportion of area of interest relative to known geopolitical boundaries. This proportion can then be applied to population data to estimate the population of interest (Figure 17.5).

General site profile maps not only provide an overview of the location and spatial
association of the study area to other general points of reference, but also present estimates of
general characteristics of an affected population.


Determining populations and population characteristics. Geographic information
system methods provide the means for determining population estimates within specific geographic
areas for populations of particular interest. In these maps, population count, percentage
of people 65 years old, and percentage of people in poverty based on 2014 American Community
Survey Estimates are shown by census tract.


Network analysis. Using road or public transportation networks provides a more accurate analysis of travel times, distances, and connectivity over more traditional buffer methods. Through network analysis, this series of maps demonstrate the change in access to pharmacies as a result of Hurricane Maria&rsquos impact on the island of Puerto Rico.


Using geographic information systems for sampling. As shown here, housing data, roads, and neighborhood information can be used to develop a sampling plan when conducting fieldwork.


Applying geographic information systems (GIS) to estimate populations of interest within specified areas. Through GIS, it is possible to estimate sociodemographic characteristics when boundaries of interest do not conform to standard political boundaries. These estimates can be calculated by allocating the same proportion of geographic area included in the boundary of interest to sociodemographic characteristics.

Exploring Rates Across Space and Time

Preliminary spatial and temporal analyses of baseline rates can be useful at this early investigation stage for establishing an outbreak&rsquos existence. Through spatial and temporal methods, estimates of changing rates of diseases or injury across time might become apparent. A series of static maps can present temporal trends of disease distributions (Figure 17.6).

Linked micromaps, another type of map series, can display rates of disease in the same area across time or different population groups (3,8,11). Additionally, interactive software is available for animating disease distributions across time.

Uncovering Risk Factors

Analysis of environmental risk factors (e.g., wind direction, wind speed, or drinking water sources) can assist in uncovering a common exposure route. GIS can also be used for exploring and defining social networks crucial in understanding disease spread. As the transmission source is determined, a common location might also be revealed. At the least, the field investigator can begin thinking of methods for obtaining location information of patients and suspected locations where infection might be occurring to begin generating hypotheses regarding exposure and transmission factors.


  • Geographic boundaries can be customized to the particular study area and can be used to estimate underlying population counts and characteristics.
  • Preliminary spatial and temporal analyses can provide evidence of unusual disease rates across time.
  • Visualization can inform the team of potential transmission factors and changing disease patterns across different places and times.

Maps in a series provide an efficient means of presenting different aspects of the same data simultaneously. The series can represent rates among populations with different sociodemographic characteristics, or it might be used to explore changes over time. The decrease in age-adjusted mortality rates in Georgia can be seen in data from 2000, 2008, and 2016.

Collecting and Geocoding Location Data

The street address at the time of disease diagnosis, whether a residence or common establishment, is usually part of routine data collection efforts however, remembering that location information should be collected in a standardized format is crucial (3). After collection, these data can be converted into points or shapes for mapping, a process known as geocoding (3,12,13). Providing specific instructions for collecting complete and accurate address information can influence correct point placement (3,12).

The field team can determine early on the preferred method for collecting geographic coordinates during data collection in the field (2). Given inconsistencies in complete and accurate address data availability, variability in geocoding software accuracy, and the wide availability of handheld global positioning system (GPS) devices, collecting GPS coordinates might prove better than address data. Additionally, obtaining address-level data for analysis in international settings can be challenging, especially in remote locations where standardized addresses may not be available for data collection or for the geocoding process. For example, throughout the Ebola virus disease epidemic during 2014&ndash2016, infection spread rapidly. In one particularly remote village, the rapid identification of infected persons and their isolation was essential for limiting transmission. During that fieldwork, the investigator was able to use a GPS device to collect the latitude and longitude of each household location while collecting interview data (14). Having these household locations enabled spatiotemporal analysis of transmission risk factors. Without the point locations, examining risk factors at the household level may not have been possible.

In addition to GPS data, network location information from a cellular device might be used to identify location. Today, almost any standard cellular device can generate geographic data. In one instance of fieldwork in Africa, the field investigators used the geotagging function on their cellular phones to take pictures inside their pockets to document their locations.

Beyond Points on a Map

A common misconception is that location can represent only a single position. Collecting spatial data in formats other than points (e.g., lines or polygons) is additionally informative (2&ndash4). Moreover, certain spatial data can represent abstract ideas (e.g., activity space) (15,16). Activity space can include places of employment, houses of worship, residences, restaurants, points of food purchase, recreational areas, friends&rsquo residences, and anywhere else the persons of interest might have frequented. Therefore, when in the field, an investigator should not be limited to recording a street address or assigning a single georeference point.

Visualizing distribution points of contaminated products through road network data can be informative during the case identification process. Other spatial data that might be of interest reflect the movement of materials between facilities and the points of interaction with affected populations. Food products can undergo a lengthy trip from production or harvesting to the consumer and every location along the route. Even the route itself can be a source of risk. Processing locations (e.g., water treatment plants or heating, ventilation, and air conditioning handlers) also can be sources of risk. Collecting data about the locations of and connections between these networks can aid the team&rsquos understanding of the risk factors.

In addition, GIS and location information can be useful in understanding the impact of specific interventions or changes in the natural or built environment. For example, in Atlanta, Georgia, GIS and location information was used to study possible health impacts to residents resulting from the development of a city &ldquoBeltline&rdquo to improve urban walkability and enhance active commuting (17,18). Field data collection efforts included location and measurement of sidewalk characteristics, walkability, and aesthetics (Figure 17.7). Data were collected for specific road segments, mapped, and spatially analyzed to examine the possible impact of the &ldquoBeltLine&rdquo on local residents&rsquo health.


  • GIS can be used to specify the place associated with the case definition.
  • GIS can be informative for planning field data collection methods.
  • The type of analysis will influence spatial data needs and spatial data collection tools.
  • Geographic-level data collected during fieldwork will affect the specificity of visualization and the spatial statistical methods during analysis.
  • Field investigators should think beyond collection of latitude and longitude, point-level data.

Characterizing the Geographic and Sociodemographic Distribution of Disease

Often, the first look at the data involves creating a map visualizing the disease distribution. Maps can comprise points representing the location of each case or display the geographic distribution of rates or changes in the distribution of counts or rates across time (1&ndash3,8). Both count and rate data can be aggregated to different geographic units (e.g., census tracts, counties, or zip codes). The technique known as choropleth mapping visualizes the intensity of the counts or rates by using boundary aggregations (Figure 17.8) (1&ndash3,8). Selecting classification breakpoints and color schemes are chief considerations (8,19).

Analyses do not have to be restricted to commonly used geopolitical boundaries. For example, mapping the accumulation of cases among homes within a village might be useful. With this information, choropleth maps of the number of cases, or rates, within each home can be compared with the quantity in other homes within the study area. Another possibility is for the map to represent the location of cases in rooms in a building (e.g., in a hospital or nursing home).

GIS Operations and Their Utility

Point-level analyses of cases can provide an overview of the extent of disease distribution. Point-level data also are needed for evaluating spatial clustering of disease. Alternatively, service area or activity space analyses can help characterize the extent of disease distribution on a more relative and temporal scale. As previously mentioned, another advantage of GIS is incorporating other spatially related information into the analysis, thus providing context for disease patterns and insights regarding place-based risk factors. For example, during the 2016 Flint, Michigan, shigellosis outbreak, cases were aggregated by census area for reporting and visualization. In doing so, the team was able to examine the case rates in relation to reported water-quality events, thus leading to more in-depth spatial analysis (20).

Providing Context with Supplemental Data

Analyzing supplemental data (e.g., environmental or infrastructure data) enables further contextualization of the public health problem. For example, during the investigation of elevated lead levels in Flint, Michigan, water supply system data were important for understanding the common source of contamination and identifying particularly vulnerable populations (i.e., child residents). Similarly, waterline information was used to model chlorine residuals to understand a later outbreak of shigellosis in the same area (20).

Another resource is remotely sensed data. Remotely sensed data can include aerial and satellite images, or they can be data collected by sensors on satellites orbiting in space (2). Remote sensing techniques can aid in locating key geographic features or monitoring change across time. Imagery can be particularly useful in preparing for responses to natural disasters by providing an aerial view of environmental and infrastructural damage and stranded populations. For example, after Hurricane Harvey&rsquos landfall in Texas in 2017, field investigators analyzed satellite imagery to predict and prevent mold exposure. After the 2010 earthquake in Haiti, satellite imagery was used to locate stranded populations and to identify the locations to which affected residents were moving to find shelter. During the 2016&ndash2017 Zika virus infection response in Puerto Rico, spectral signature remote sensing techniques were used to locate standing water, which served as a breeding ground for Aedies egypti mosquitos potentially carrying the Zika virus (Figure 17.9).

Visualizing Disease Across Time

GIS can be used to visualize disease progression, changing concentrations, or distribution of risk factors across time. Static map series, linked interactive micromaps, and animations are methods for such visualization. An animation of the spread of Ebola virus infection among households and the institution of household-wide and village-wide isolation and quarantine efforts in Sierra Leone was particularly informative in understanding the outbreak&rsquos epidemiologic curve (14). New tools are also being developed to visualize the slope of an epidemiologic curve for every geographic unit within a study area. As the direction and magnitude of this slope is mapped, a visualization of the stage, magnitude, and geographic distribution of an outbreak can be realized.

Mapping the Cholera Epidemic of 1854

Students will create a basic web map of the 1854 John Snow cholera investigation using the Story Map Basic template.

Biology, Health, Geography, Geographic Information Systems (GIS)

Tech Vs Natural Selection

This illustration depicts scientists using technology to track the spread of disease.

Illustration by Owen Freeman

This lists the logos of programs or partners of NG Education which have provided or contributed the content on this page. Content Created by

Dr. John Snow is regarded as one of the founding fathers of modern epidemiology. During a major cholera epidemic in 1854 London, he collected and mapped data on the locations (street addresses) where cholera deaths occurred. His process was laborious and slow, but ultimately very informative. His painstaking and detailed analysis led to the identification of the epidemic’s source—a contaminated public water source. Today, John Snow’s data has been geocoded, making it accessible in a GIS. In this lesson, you will create a heat map showing the locations that experienced the highest number of cholera deaths in the epidemic. You will share this heat map as a basic story map. [Note: You can watch a short overview of John Snow’s work in Episode 4 of The Geospatial Revolution.]

A decade of systems biology

Systems biology provides a framework for assembling models of biological systems from systematic measurements. Since the field was first introduced a decade ago, considerable progress has been made in technologies for global cell measurement and in computational analyses of these data to map and model cell function. It has also greatly expanded into the translational sciences, with approaches pioneered in yeast now being applied to elucidate human development and disease. Here, we review the state of the field with a focus on four emerging applications of systems biology that are likely to be of particular importance during the decade to follow: (a) pathway-based biomarkers, (b) global genetic interaction maps, (c) systems approaches to identify disease genes, and (d) stem cell systems biology. We also cover recent advances in software tools that allow biologists to explore system-wide models and to formulate new hypotheses. The applications and methods covered in this review provide a set of prime exemplars useful to cell and developmental biologists wishing to apply systems approaches to areas of interest.


Meta-analysis of systems biology publications…

Meta-analysis of systems biology publications over the past decade. ( a ) A…

Overview of the experimental process…

Overview of the experimental process in classical biology ( top ) versus systems…

Predictive subnetwork markers for breast…

Predictive subnetwork markers for breast cancer metastasis. ( ac ) Subnetworks…

( a ) Complexes associated…

( a ) Complexes associated with RAD6-C histone ubiquitination. Protein-protein interactions are enriched…

A model of mitotic regulation…

A model of mitotic regulation by Ras . ( a ) BI-2536, a…

A systematic strategy for network…

A systematic strategy for network reconstruction. ( a ) Cell state is measured…

Core embryonic regulatory networks for…

Core embryonic regulatory networks for cell fate decisions. ( a ) High-confidence protein-protein…

Graphical user interface of Cytoscape.…

Graphical user interface of Cytoscape. Each window showcases a different analysis or visualization…

Screenshot of Cell Designer when…

Screenshot of Cell Designer when drawing a network as process diagrams.

Screenshot of Cell Designer when…

Screenshot of Cell Designer when stimulating a network model given different input parameters.


Though few think of the U.S. government as “extremely online,” its agencies can access more data than Google and Facebook combined. Not only do its agencies maintain their own databases of ID photos, fingerprints and phone activity, government agents can get warrants to obtain data from any American data warehouse. Investigators often reach out to Google’s warehouse, for instance, to get a list of the devices that were active at the scene of a crime.

Though many view such activity as an invasion of privacy, the U.S. has minimal privacy regulations. Even California’s radical new privacy law offers citizens no protections against government monitoring. In short, the government’s data well won’t run dry anytime soon.

Here are some of the ways government agencies apply data science to vast stores of data.


Equivant: Data-Driven Crime Predictions

Location: Canton, Ohio

How it uses data science: Widely used by the American judicial system and law enforcement, Equivant’s Northpointe software suite attempts to gauge an incarcerated person’s risk of reoffending. Its algorithms predict that risk based on a questionnaire that covers the person's employment status, education level and more. No questionnaire items explicitly address race, but according to a ProPublica analysis that was disputed by Northpointe, the Equivant algorithm pegs black people as higher recidivism risks than white people 77 percent of the time — even when they’re the same age and gender, with similar criminal records. ProPublica also found that Equivant's predictions were 60 percent accurate.


ICE: Facial Recognition in ID Databases

Location: Washington, D.C.

How it uses data science: The U.S. Immigrations and Customs Enforcement, a.k.a. ICE, has used facial recognition technology to mine driver’s license photo databases in at least two states, with the goal of deporting undocumented immigrants. The practice — which has sparked criticism from both an ethical and technological standpoint (facial recognition technology remains shaky) — falls under the umbrella of data science. Facial recognition builds on photos of faces, a.k.a raw data, with AI and machine learning capabilities.


IRS: Evading Tax Evasion

Location: Washington, D.C.

How it uses data science: Tax evasion costs the U.S. government $458 billion a year, by one estimate, so it’s no wonder the IRS has modernized its fraud-detection protocols in the digital age. To the dismay of privacy advocates, the agency has improved efficiency by constructing multidimensional taxpayer profiles from public social media data, assorted metadata, emailing analysis, electronic payment patterns and more. Based on those profiles, the agency forecasts individual tax returns anyone with wildly different real and forecasted returns gets flagged for auditing.

Diseases of the hematologic, immunologic, and lymphatic systems (multisystem diseases)

Benjamin W. Newcomer , . Misty A. Edmondson , in Sheep, Goat, and Cervid Medicine (Third Edition) , 2021


Control of EHD is difficult and relies on a combination of disease surveillance , vector control, and potentially, vaccination. Eradication of vector-borne diseases from endemic areas is difficult and time-consuming, and thus, disease control is likely more attainable than strict eradication. Vector control is more important in the late fall and summer, when populations are at peak levels and viral transmission is more likely. Midge-proofed housing and the treatment of animals with pyrethroid insecticides have been attempted but may be logistically challenging and have yet to have been demonstrated efficacious. Vaccine availability in North America is limited, but inactivated autogenous vaccines have been developed from isolates obtained from ill or recently diseased animals. Autogenous vaccines are tested for purity but not necessarily for efficacy. Vaccine usage must be approved by the U.S. Department of Agriculture prior to administration.

What Is Big Data In Healthcare?

Big data in healthcare is a term used to describe massive volumes of information created by the adoption of digital technologies that collect patients' records and help in managing hospital performance, otherwise too large and complex for traditional technologies.

The application of big data analytics in healthcare has a lot of positive and also life-saving outcomes. In essence, big-style data refers to the vast quantities of information created by the digitization of everything, that gets consolidated and analyzed by specific technologies. Applied to healthcare, it will use specific health data of a population (or of a particular individual) and potentially help to prevent epidemics, cure disease, cut down costs, etc.

Now that we live longer, treatment models have changed and many of these changes are namely driven by data. Doctors want to understand as much as they can about a patient and as early in their life as possible, to pick up warning signs of serious illness as they arise – treating any disease at an early stage is far more simple and less expensive. By utilizing key performance indicators in healthcare and healthcare data analytics, prevention is better than cure, and managing to draw a comprehensive picture of a patient will let insurance provide a tailored package. This is the industry’s attempt to tackle the siloes problems a patient’s data has: everywhere are collected bits and bites of it and archived in hospitals, clinics, surgeries, etc., with the impossibility to communicate properly.

Indeed, for years gathering huge amounts of data for medical use has been costly and time-consuming. With today’s always-improving technologies, it becomes easier not only to collect such data but also to create comprehensive healthcare reports and convert them into relevant critical insights, that can then be used to provide better care. This is the purpose of healthcare data analytics: using data-driven findings to predict and solve a problem before it is too late, but also assess methods and treatments faster, keep better track of inventory, involve patients more in their own health, and empower them with the tools to do so.


In 2016, the NHLBI released its Strategic Vision, which will guide the Institute’s research activities for the coming decade. Many of the objectives and compelling questions identified in the plan focus on factors that account for differences in health among populations. For example, researchers are looking at what factors make individuals or populations resistant or prone to diseases, despite having experienced the same exposures such as diet, smoking, environmental and social factors. Recruiting and retaining researchers interested in epidemiology research and developing a diverse scientific workforce are also high priorities.

Genes and biology may account for some differences in health among different populations. However, a wide range of factors related to lifestyle choices, behaviors, and socioeconomic status may also play a role in causing differences in health. Our research seeks to better understand the causes of health differences and to identify ways to improve public health.

Population studies have entered an exciting period when advances in assay methods, imaging technologies, and electronic data are creating new scientific opportunities. These tools make it possible for large epidemiology studies to explore what makes individuals susceptible to disease. To capitalize on these opportunities, NHLBI established an Advisory Council Working Group on Epidemiology and Population Science, which looked at the current landscape, emerging tools, and future opportunities in population science and made important recommendations that contributed to the Institute’s strategic thinking in this area.

The NHLBI’s large-population cohort studies have been major generators of new knowledge that has informed the molecular basis for disease and identified targets for new treatments. For example, NHLBI research has transformed the way the public approaches cardiovascular disease by conducting numerous studies that focus on diverse populations. The Women’s Health Initiative (WHI) continues to yield new insights that advance our understanding of heart disease and other diseases in women.

It is important that the NHLBI continue to build on its legacy of excellence in population studies research. Our population studies have led to a wide range of discoveries and initiatives that will reduce health disparities and improve health outcomes in heart and vascular diseases, obesity, women’s health, and precision medicine.


  1. Kajas

    It's happiness!

  2. Tusar

    It seems to me an excellent phrase

  3. Galileo

    Interesting even for an accountant))))

  4. Tohopka

    As well as possible!

Write a message