Thanks for your answers. In my model, the signal is defined as , where is a scalar constant denoting baseline intensity value, the contribution of 'true' peaks to mass spectrum signal, and the contribution of noise. I also postulate that noise vector values are normally distributed around zero.
Based on your suggestions, I developed an algorithm to extract baseline intensity and noise standard deviation using histogram analysis procedures.
For a spectrum, if I plot the cumulative distribution, I obtain the following result:
By looking at the low percentile values (below ~2800), we can see that the distribution exhibits the behavior of a normal CDF. So I developed a function to calculate the sum of squared deviations between (SSD) the theoretical normal CDF values and those experimentally observed for indices below a certain value in the percentile vector, which also permits to calculate the corresponding mean value (baseline intensity) and standard deviation (noise) from the input signal. And I tried to minimize the SSD by adjusting the bounding index using fminbnd.
It seems to work properly. In the present example, the bounding index was optimized to a corresponding percentile value near 3000 counts. Just below, you will see the comparison between experimental and optimized CDF values.
If I calculate baseline intensity and noise from that optimized index, I can represent the results on my spectrum:
Clearly, we have a good solution. The procedure was repeatable over the different spectra acquired during the LC-MS analysis, and I'm able to separate the instrumental noise from the chemical noise (which can be interpreated as baseline nonlinearity for low m/z values.) Depending on the step value for cumulative distribution calculation using prctile, the algorithm takes more of less time. By setting an appropriate step, good solutions can be obtained within less than 0.1 s.
You will find attached the code and the example file.