Quantity vs Quality in Coffee Data | by Robert McKeon Aloe | Mar, 2023

By Jessie Hobb On Mar 27, 2023

Coffee Data Science

My experimental data collection

In coffee, taste is king, but quantifiable data for extraction efficiency using Total Dissolved Solids (TDS) has been a useful tool to help evaluate hardware and techniques. TDS is measured using a refractometer where there is a preference for a digital refractometer.

In the past year, the cost of a digital refractometer has dropped significantly. DiFluid has come out with two refractometers for much less than the standard VST or Atago. Currently, the data suggests the DiFluid R2 is as capable as the VST or Atago. I think this refractometer presents an interesting question to how accessible coffee data is becoming: what’s more important, quality or quantity with respect to data collection routines?

An example of some data, all images by author.

To recap, refractometers can measure Total Dissolved Solids (TDS) which is a great metric to understand coffee strength and calculate extraction efficiency. It has become a vital tool for me in my explorations.

To be clear, I only make espresso at a high strength (12% to 20% TDS at 16% to 24% EY), and refractometers may have other challenges for lower strength brews like filter coffee. However, I do not address those topics.

I have not yet published a routine on how I collect a TDS measurement using a refractometer even though I have three digital ones: Atago, DiFluid, and DiFluid R2. I’ve been working through multiple explorations to justify with data whether part of my routine is valuable relative to the time it takes to collect a sample such as:

Cooling a sample to a given temperature (usually the same temperature as calibration).
Filter samples using a syringe filter
Use a new pipette for every sample
Clean the glass sample dish with alcohol
Calibrate the device every sample

A routine with all of steps could take awhile, and the effect is less data can be collected in the same period of time.

I’ve been doing data stuff for a long time (over a decade). One of the issues that typically comes up in a user study is quality of data vs quantity. To get better quality data, a person has to comply better with the protocol, but compliance costs time. However, once a machine learning algorithm is applied, a certain amount of noise is added to the data anyways. It turns out that more data even if it is less quality, can be more desirable for some experiments because we don’t have all day to do all the stuff.

So even if there is a noise in the signal, collecting more samples at a faster rate could allow for the noise to be averaged out. I like applying this to coffee as well because I have other things I want to do in life.

Not everyone has the lab, money, and time to control all variables. So control as much as you can. Even if you have noise, as long as that noise is consistent, then it is more controlled than random. The worse is systematic bias in the noise.

Another piece to consider is that refractometry for coffee is not well understood. We know there is a connection between refractive index and TDS, but there is still some gray area. Sugar water has a very clear refractive index, but looking through a optical refractometer, coffee does not have as distinct of a line.

Do solubles from the beginning of the shot cause the same refraction as solubles from the end of the shot?

How homogenous is any given coffee drink?

Basically, what is the inherent noise assuming the refractometer is perfect? If this noise is substantially larger than other protocol steps, then those steps should be reconsidered.

Do digital refractometers suffer from calibration drift?
Do they also age gracefully?
If two samples are taken at a higher temperature than calibration, does that matter? Does temperature impact reading the same way or is it a controlled variable?

The DiFluid devices are interesting because they also output the refractive index. This can help show if the reading is caused by temperature changes or something else.

I will share my current routine, but this is subject to change. This routine is data driven, and here is the short form followed by a long form with justification:

Device: DiFluid R2
Calibration: I rarely calibrate my device.
Sample Collection: I stir the sample and use a pipette to collect it. I rinse and reuse the pipette afterwards.
Sample Filtration: I don’t filter my sample.
Sample Temperature: I didn’t correct for sample temperature.
Number of Samples: 1
Cleaning the Lens: I use a microfiber towel.

Long Form:

The R2 is at least as accurate as the Atago, and data suggests it is more accurate and might be more accurate than the VST. It also produces a reading much faster than the Atago.
Calibration: I rarely calibrate my device. I haven’t tested for calibration drift, but if there is drift, it should be affecting all my samples equally and average out. If other data was produced on the topic, I would be open to changing my routine.
Sample Collection: I stir the sample and use a pipette to collect it. I don’t like being so wasteful of pipettes so afterwards, I rinse them and use them until I decide it is time to change. For a sugar test on a refractometer, I use a new one. It is to be determined how much this impacts the sample.
Sample Filtration: I don’t filter my sample. Evidence suggests filtering samples doesn’t improve accuracy only precision. I usually collect more samples than I need to compensate for precision.
Sample Temperature: I didn’t correct for sample temperature. I have looked at sample temperature, and I found a small but statistically significant change when cooling a sample vs using a hot sample. However, as long as I’m doing the same across all samples, the variable doesn’t impact conclusions because performance is relative. Oddly enough, I have been doing extract cooling as of late, so my samples have been a lot cooler than they had been.
Number of Samples: One. I’m not interested in collecting more, but I have in the past shown if you leave a sample on the device for a few minutes, it will evaporate and the reading will change as a result. I’m not sure taking multiple samples will increase the quality either.
Cleaning the Lens: I use a microfiber towel. I don’t use alcohol or alcohol wipes. Glass cleans pretty clean if you pay attention.