Accessing individual fields in GRIBs (even remotely)
A GRIB file consists of one or more messages, or records. Each message consists of a 2-dimensional grid of numeric values and all the metadata to required to identify, geolocate, and unpack the data*. Each message is completely independent of the others, meaning that you can take a message out of any GRIB file and add it to another, or write it to its own file, and any GRIB reading software will handle it just fine.
This independence is what allows us to directly access any field in a GRIB file directly. If you know the start and ending byte location of a message for a field you want, you simply need to read those specific bytes. But how do you know the byte locations for the field you want?
Enter the GRIB index file
This is a section of the file listing of the GFS data from NCEP's NOMADS http server:
Notice that for every GRIB file (the large files with no file extension) there is an equivalent ".idx" file. These are index files (a.k.a inventory files) - text-based files that list every message in the matching GRIB file. An excerpt from one of these files is below:
1:0:d=2022102106:PRMSL:mean sea level:1 hour fcst:
2:982989:d=2022102106:CLWMR:1 hybrid level:1 hour fcst:
3:1082361:d=2022102106:ICMR:1 hybrid level:1 hour fcst:
4:1362682:d=2022102106:RWMR:1 hybrid level:1 hour fcst:
5:1632677:d=2022102106:SNMR:1 hybrid level:1 hour fcst:
6:1736762:d=2022102106:GRLE:1 hybrid level:1 hour fcst:
7:1792774:d=2022102106:REFD:1 hybrid level:1 hour fcst:
8:2624783:d=2022102106:REFD:2 hybrid level:1 hour fcst:
9:3457153:d=2022102106:REFC:entire atmosphere:1 hour fcst:
10:4356493:d=2022102106:VIS:surface:1 hour fcst:
11:5094001:d=2022102106:UGRD:planetary boundary layer:1 hour fcst:
12:5682189:d=2022102106:VGRD:planetary boundary layer:1 hour fcst:
13:6267979:d=2022102106:VRATE:planetary boundary layer:1 hour fcst:
Fields are delimited by colons; you can clearly see the model date, field name, vertical coordinate, and forecast hour information. The first column is the message number, and the 2nd column is the starting byte of the message in the file. So let's say you want to access the surface visibility, which the index file shows as being message number 10:
10:4356493:d=2022102106:VIS:surface:1 hour fcst:
11:5094001:d=2022102106:UGRD:planetary boundary layer:1 hour fcst:
The first byte of the message is number 4356493 and the last is 5094000 (the starting byte of the next message minus 1). Knowing this, we could read just those bytes and write them out to a new file using whatever language floats your boat, for example in python:
input_grib_file = 'gfs.t06z.pgrb2.0p25.f001'
output_grib_file = 'gfs.t06z.pgrb2.0p25.visibility.f001'
start_byte = 4356493
end_byte = 5094000
message_byte_length = end_byte - start_byte + 1
# Open the file for reading in binary mode
with open(input_grib_file, 'rb') as _in:
# jump to the first byte of the message we want to extract
_in.seek(start_byte)
# read in the number of bytes equating to the length of the message
message_bytes = _in.read(message_byte_length)
# Write the message bytes to a new file
with open(output_grib_file, 'wb') as _out:
_out.write(message_bytes)
Downloading individual fields remotely
The same approach can be used to download specific messages from a remote GRIB file. The only special requirement is that the remote web server supports range requests, which most do (NOMADS does, as do Amazon S3 and Google Cloud Storage). To download just that same visibility message from NOMADS using cURL you would do something like this:
curl https://nomads.ncep.noaa.gov/pub/data/nccf/com/gfs/prod/gfs.20221021/06/atmos/gfs.t06z.pgrb2.0p25.f001 -H "Range: bytes=4356493-5094000" --output gfs.t06z.pgrb2.0p25.visibility.f001
Output:
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 720k 100 720k 0 0 1802k 0 --:--:-- --:--:-- --:--:-- 1805k
720 kilobytes downloaded - way better than downloading the entire 518 megabyte file!
But it's not all roses...
While index files are commonly found alongside the GRIB data they describe, they're not part of the actual GRIB format. It's up to the data provider to generate the index files, and while it's generally in their favor to do so as it can significantly reduce load, they may not always be available. Generally speaking, you can't use the index file from one GRIB file to identify the byte ranges of the fields you want in another. GRIB files use a variety of compression algorithms and so while the order of messages generally remains consistent from one file to the next of a certain product (ex: the fields in GFS files are usually in the same order until an updated version is released), the byte ranges of the fields almost never match exactly from one file to the next.
* A GRIB message may also contain submessages which are additional 2D grids and their associated metadata. These are uncommon and typically used to group related fields, such as the U and V wind fields.
Comments ()