Obtaining raw data from proprietary HPLC analysis programs. HPLC software suites from various vendors are often quite different and the means for obtaining raw ascii spectra data from each varies. However, of the handful of such programs that I have dealt with, the option for exporting raw data in text format can usually be found near the 3D field representation of the diode array detector data (DAD).
Preparing the raw data
Most of the proprietary HPLC software suites include a good deal of information in the header of the raw text data. The program isn’t designed to interpret the header information so simply delete all of it. You can also rename the file and use any suffix of your choosing (ie. .txt).
There is also the possibility that the raw data contains characters that are problematic during interpretation by the numpy function ‘genfromtxt’. Here is a simple python script that can be used to remove unwanted characters – such as commas in numbers greater than or equal to 1000 (1,000).
The HPLC DAD example file has these two parts completed.
Settings for opening the raw text file
There are several settings within the HPLC DAD Reader python file that you the user must edit.
[1] the file name:
file_name = ‘your_file_name.txt’ (any file extension will work)
[2] time interval between spectrum scans:
time_interval = 0.2 (a reading was taken every 1/5th of a second in the example file)
[3] the minimum and maximum wavelength of the data file:
min_wavelength = 190
max_wavelength = 350
[4] the wavelength range that you’re interested in looking at:
(You can view the entire range, if you would like. I prefer to only look from 210nm to 300nm when analyzing nucleosides)
selected_min_wavelength = 210
selected_max_wavelength = 300
[5] the column offset:
(Some raw spectra data files have time intervals or some other piece of data at the beginning of each spectrum read. You must ‘jump over’ this by simply setting ‘column offset’ to the column number of the first wavelength.Python uses zero as the first index in a list, so if my file has 2 columns of useless data, index 0 = junk1, index 1 = junk2, and index2 = first wavelength, then I would set the column_offset = 2.)
column_offset = 2 (0 if absorbance data is in the first column)
[6] wavelength interval:
All of the diode array detectors that I have worked with measure at every wavelength between the minimum and maxium, but if your wavelengths are read at different intervals then change:
wave_interval = 1
to an appropriate number.
[7] Choose the wavelength that will represent the data in the main figure that plots the absorbance of that wavelength (y) over time (x):
select_wavelength = 254
The main figure – plot of absorbance over time
Once all of the settings are established, simply run the script to see the ‘select_wavelength’ in relation to the ‘time_interval’ multiplied by the number of spectrum reads. Below is the plot from the example file. The data is generated from Trypanosoma brucei cytosolic tRNA that has been digested to nucleosides with nuclease P1 and ran through a C-18 reverse-phase column(254nm is the most commonly used wavelength for analyzing general nucleic acid absorbance).
Interacting with the main figure
In the ‘onpick(event), choose one of the two modes to either ‘spectrum snapshot’ or ‘peak processing’
ie. mode = ‘peak processing’
Currently, there are two ways to look at spectrum profiles:
[1] Spectrum snapshots
mode = ‘spectrum snapshot’
Simply click anywhere on the plotted line of the main figure to pull up the spectrum profile that corresponds to that point in time.
This technique is fine for looking at medium-to-high abundance peaks where the background absorbance plays a minimal role. The example above, however, is of a dinucleoside in extremely low abundance. Look at its absorbance at 254nm in comparison to other nucleosides within the same sample.
The background absorbance and UV-Vis detector noise play a dominant role in such low abundance peaks. ‘Peak processing’ handles this weak data and can generate an an interpretable spectrum profile.
[2] Peak processing
mode = ‘peak processing’
Click on two points around a peak at the front and back base. The data within these two boundaries will then be processed to create one averaged spectrum profile with background subtracted and absorbance intensity changed to relative intensity (relative to the total intensity in the averaged spectrum profile). The program will then use these points to establish the background absorbance, which is then subtracted from each spectrum profile that lies between the two points.
After peak processing, the spectrum profile from the same OHyWpA that is seen as an example of ‘snapshot’, looks radically different. This is mostly due to the background subtraction of of the spectral profiles within the peak of interest. Averaging across of the profiles and the Savitzky-Golay smoothing algorithm also contribute to the transformation of the profile – a better representation of the absorbing properties of the input sample. The absorbance intensity also becomes a relative absorbance. To arrive at these values, the intensity for each wavelength between the two selected points is summed and then each wavelength intensity is divided by the summed intensity. So, the relative intensities, when summed, will equal 1. This is done to allow the direct comparison of peaks with different overall intensity (most likely due to differences in abundance). It also allows for the use of the Kolmogorov-Smirnov statistical analysis, which I have been using as a way to determine the dissimilarity between spectrum profiles.
Exporting spectrum profile data
Each time peak processing is used, the data generated to make the final plot are exported as text into the file: ‘peak processed spectra.txt’. This data can easily be copied and pasted into excel or other data processing software.