I am doing research with outcome data from a psychological treatment program that works with adolescents. A similar survey to assess mental health functioning has been given to the adolescents participating in the program as well as their parents at both time points admission and discharge.

We're interested in what factors may lead to an adolescent deteriorating during their stay at the program (

**adolescent discharge score - adolescent admission score**must meet certain criteria). We set up the dichotomous

**"deterioration"**variable.

One hypothesis we wish to test is if the difference between parent and adolescent scores at intake has a significant impact on whether or not the adolescent deteriorated. Even though the surveys administrated to the parents and adolescents are worded mostly the same, they have different "cutoff scores" (which indicate whether or not a score reflects an individual in the clinical range). Because of this, I'm assuming I need to compare the two scores based on their respective z scores.

Once calculating the z-scores for both these variables (adolescent_z and parent_z) I want to set up a "intake_difference" variable:

**intake_difference = parent_z - adolescent_z**

I'm thinking I want to use logistic regression to see what variables predict the

**deterioration**variable. In order to do this, I plan to bin the continuous

**intake_difference**variable in a discrete variable based on its standard deviation (-2, -1, 0, 1, 2) then run that new variable into the logistic regression.

Does this approach make sense? Is there a better cleaner way of doing this? Is binning the data based on standard deviation good practice? Thank you so much for your help!