Data sources to help manage variability that won’t break the bank - what's out there?

Take home messages

  • The pool of publicly available off-farm data that may be relevant to combine with on-farm data is increasing and can now be swiftly gathered for any farm or field. Collecting and using this data to make more informed decisions is an opportunity for growers, and opportunities for inclusion in decision making tools needs to be explored
  • Machine learning, and hybrid models derived from large data sets and field validation should be tested against crop simulation models currently in use for estimating yield potential and input requirements/crop response
  • Using these powerful techniques provides the opportunity to use large data sets that cover a local area for analysis of the drivers of variability in crop performance and profit rather than just using individual field data in the analysis as is the current Precision Agriculture technique
  • Building a freely available weather database at a much finer scale than is available now to improve predictions should be an industry and government imperative
  • Using power output or fuel use data recorded while working with ground-engaging implements may be a low-cost, novel way to map changes in soil strength/type.


Australian broadacre crop production currently provides approximately 35% of agricultural GDP and the nations farmers export approximately 75% of production into competitive international markets. This is achieved with a very low level of external financial support compared to almost all competitors Producer Support Estimate (PSE) 2017, Australia = <2%; OECD = 18%, USA and Canada = 10%, Kazakhstan = 4%). Optimum business performance in a competitive environment requires the application of relevant information to critical decisions relating to improving efficiencies and production quantity/quality. In cropping businesses which operate in a variable environment, information on variability in resources, environmental conditions and output is an important component of the relevant information required.

Low cost information

Yield monitor data

In terms of high value low cost information for broadacre cropping, yield monitor data should be high on a cropping managers list. With the overwhelming majority of farms already accessing high accuracy global navigation satellite systems (GNSS) and yield monitors becoming standard equipment on most harvesters, the yield mass (t/ha), grain moisture (%) and elevation (m) data available from these systems automatically during harvest operations comes at a low financial cost. The elevation data can be used to produce a range of useful information relating to changes in the landscape and its impact on soil development, water movement and solar radiation aspect. Calibrated crop yield and moisture data directly records variability in production across fields and years, and the high spatial resolution yield data is a simple yet crucial method for monitoring or modelling the effect of management changes on production and is a layer of data vital to ground-truth data gathered from off-farm.

This is a collection of four images of the same paddock - Google Earth, elevation data, crop yield data and grain moisture Figure 1. Paddock imagery on Google Earth (a), elevation data (b), crop yield data (c) and grain moisture(d)

Vehicle performance data

Performance data is routinely recorded by newer tractors and self-propelled implements. Data on variation in fuel use and other relevant operational parameters can have economic and efficiency dividends. Novel ways to use this free data include using power output or fuel use while working with ground-engaging implements to map changes in soil strength/type (Figure 2). There is also the potential to use the fuel use data in a carbon and nitrogen auditing process.

This is a collection to two images showing tractor engine load recorded during the sowing operation.

Figure 2. Tractor engine load recorded during the sowing operation (a) and soil apparent electrical conductivity (ECa) (b). Correlation coefficient = 0.85.

Publicly available data

The progress towards increasing use of digital data in agriculture is being led by a combination of improvements in sensor development, computing power, data storage/delivery, data analytical techniques and reduced costs. The synchronisation of these occurrences has in turn fuelled a greater interest in the data and its potential, thereby stimulating more development in all areas. A complementary benefit has been a rising number of data sources being made publicly, and more easily, available. Table 1 records a number of the most relevant as of early 2019.

Publicly available information can be downloaded from a range of individual providers (e.g. Geoscience Australia, CSIRO, ESA and NASA). This can be achieved on a number of platforms (e.g. Python, R and Android) using Application Program Interface (APIs).

Table 1. Public sources of data for potential use in describing variability in resources and production.




Spatial Resolution

Temporal Resolution



Bulk density, organic carbon, clay, sand, silt, pH, available water capacity, total N, total P, effective cation exchange capacity, depth of regolith, soil depth, coarse fragments.

90 x 90 x 2 m



Gridded Daily Data

Rainfall, temp, vapour pressure, solar exposure, NDVI, atmospheric circulation

5 x 5 km

Daily from 1889



Digital Elevation Model (DEM)

90 x 90 m


Geoscience Australia


Digital Elevation Models (DEM) and (DEM-S)

5 x 5 m


Geoscience Australia


Hydrological DEM (Hydrological features are enhanced)

30 x 30 m



Sentinel 2

13 bands from ~ 430 to 2190 nm



Geoscience Australia


Radiometric map of Australia

100 x 100 m




14 Bands from visible to thermal IR

15-30-90 m

Weekly from 2000



36 spectral bands

250-500-1000 m

Weekly until 2010



8 bands from ~450 to 2350 + ~10.400 to 12.500

15-30-60 m

Second week



11 bands from ~435 to 2294 + ~10.600 to 12.500

15-30-60 m

Second week



Soil moisture and Carbon Net Ecosystem Exchange

9 x 9 km

Weekly and Second week



Futures and commodity exchange data



In 2010, Google released the “Google Earth Engine” - a platform dedicated to providing access to a multitude of different data layers (including most listed in Table 1) at the cost of registration time only. A substantial advantage of this platform is that all the information is stored in a database standardised to different resolutions, in the same geographic reference system. Data can be downloaded in user-configurable locations and resolutions for use as required. For example, yield data (or the location of yield data points) can be uploaded to Google Earth Engine as private data and then the public data layers can be extracted to match the geographic locations and extent of the input layers. Alternatively, boundaries of farms or fields can be used to clip and extract data layers. The platform also enables real-time analysis using Google’s computing infrastructure that runs processes across thousands of computers in parallel, enabling large analytical tasks to be performed or task time to be drastically reduced.

Machine learning and data fusion approaches

The increase in availability of digital data and processing capabilities is leading to the application of data fusion techniques and machine learning to search for new insights from the data. There have been significant developments in machine learning analytical methods, which differ from mechanistic or process-based models that are commonly used in cropping because they use data-driven approaches to discover relationships between variables.

A major advantage is that they can make use of both quantitative and qualitative data from a wide range of data sources. On-farm data from sensors currently used in precision agriculture, along with what will be an increasing variety of sources, volumes, scales and structures of off-farm data (from other local/regional farms and the non-farm domains shown above) can be input into analysis and decision-making back on-farm.

GRDC future farm program

The future farm program (CSIRO, UYSD, USQ, QUT, AGVIC) aims to utilise this off-farm data, historical on-farm data and re-examine and improve the way in which current in-season field monitored data (soil, crop, climatic) are used to inform decisions about input management. The outcome should be a way of automating the process from data acquisition, through analysis, to the formulation and implementation of decision options with manager input. The initial focus is on improving the efficiency and profitability of applied nitrogen (N).

The initial operational parameters are:

  • N fertiliser decision making should be supported by measures of plant N status (which in turn requires estimation of biomass), soil N status and soil water status/availability i.e. a multi-sensor approach is required
  • In-season sensor data will be a key input and employ machine learning methods of data integration for development of location-specific decision options
  • Both remote and proximal sensing of the crop canopy will make an important contribution to N fertilizer decision making, but this should be supported by some form of on-farm experimentation, with a zero N treatment (plot or strip) a critical enabler for interpretation
  • The process should be deployable in a way that will be complementary to the inclusion of other inputs/assessments that managers may also bring to bear in decision making.


The research undertaken as part of this project is made possible by the significant contributions of growers through both trial cooperation and the support of the GRDC, the authors would like to thank them for their continued support.

Contact details

Brett Whelan
Precision Agriculture Laboratory, Sydney Institute of Agriculture, The University of Sydney
Biomedical Building, Australian Technology Park, Eveleigh NSW 2015
Ph: 02 8627 1132

GRDC code: 9176493

GRDC Project Code: CSP1803-020RMX,