Understanding accuracy of expressions #464

zhuyi-bjut · 2023-11-14T05:32:02Z

hello！
In my recent research, I used pysr to do some symbolic regression tasks. I found that pysr 's loss is even smaller than ANN in some cases. How can I explain this magic of pysr ? Why is the result of low-dimensional expressions better than high-dimensional networks ?
Thanks！

MilesCranmer · 2023-11-14T08:24:23Z

Hi @prozhuyi,

Thanks for this. Yes I also find sometimes symbolic expressions beat neural nets for specific problems. It really has to do with priors over the space of functions. When you train a neural net, there is an implicit prior that the function will be smooth and other properties.

Symbolic regression imposes a different prior over the space of functions. Sometimes you will have that this prior is superior to the neural net prior, especially if the operators you are using are an efficient basis for describing your field.

cheers,
Miles

zhuyi-bjut · 2023-11-14T08:29:11Z

I seem to understand ! Thank you for your answer！

zhuyi-bjut · 2023-11-16T15:00:00Z

hello Miles @MilesCranmer

I recently had another question, which is the ' score ' given by pysr. How is this ' score ' obtained ? Is it obtained by this step ?

           `if lastMSE is None:
                cur_score = 0.0
            else:
                if curMSE > 0.0:
                    # TODO Move this to more obvious function/file.
                    cur_score = -np.log(curMSE / lastMSE) / (curComplexity - lastComplexity)
                else:
                    cur_score = np.inf`

and what is its significance ?
thanks again！

MilesCranmer · 2023-11-16T22:51:41Z

Yes, that is the score. It basically is a heuristic that looks for sharp decreases in loss when increasing complexity (traditional metric for "best" equation in SR). There are more details on this in the PySR paper: https://arxiv.org/abs/2305.01582

tanweer-mahdi · 2023-11-26T23:02:36Z

Hi @MilesCranmer ,

It is a very interesting discussion. Just elaborating your answer a little more and correct me if I am wrong:

The ANN assumes a prior over the space of smooth (and other properties) functions whereas Symbolic Regression can allow non-smooth functions as well, which sometimes can be a more suitable prior for a particular problem.

Is the above statement correct?

MilesCranmer changed the title ~~About the effect of pysr~~ Understanding accuracy of expressions Nov 14, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Understanding accuracy of expressions #464

Understanding accuracy of expressions #464

zhuyi-bjut commented Nov 14, 2023

MilesCranmer commented Nov 14, 2023

zhuyi-bjut commented Nov 14, 2023

zhuyi-bjut commented Nov 16, 2023

MilesCranmer commented Nov 16, 2023

tanweer-mahdi commented Nov 26, 2023

Understanding accuracy of expressions #464

Understanding accuracy of expressions #464

Comments

zhuyi-bjut commented Nov 14, 2023

MilesCranmer commented Nov 14, 2023

zhuyi-bjut commented Nov 14, 2023

zhuyi-bjut commented Nov 16, 2023

MilesCranmer commented Nov 16, 2023

tanweer-mahdi commented Nov 26, 2023