I hope the stuff that I’m gonna present

today is useful to you as a clinician and I will try to avoid talking about

anything in terms of formulas or anything complicated in terms of

statistics okay so I’m basically gonna start with just a brief overview of the

review process and then walk through each section of a manuscript with you

and talk about some of the common mistakes or things that I’m looking for

when I’m reviewing papers and then finish with a brief discussion so just

to give you some background on myself in addition to my responsibilities here at

Columbia I’m also a deputy statistical editor for The Journal of thoracic and

cardiovascular surgery and I’m also a statistical editor for the American

Journal of Drug and Alcohol Abuse I’ve reviewed a lot of papers in the last few

years and I’ve reviewed four other journals as well and I just wanted to

state that these are not my these are my personal thoughts not necessarily the

thoughts of the journal this is from my own personal experience and I wanted to

make a mention of the fact that when we review papers it’s a service to the

field I don’t get paid to do the reviews and often there’s a quick turnaround so

14 days 3 weeks so as someone who’s writing a paper you want to make the

review process as easy as possible for the reviewers and if you’re asked to

review a paper I encourage you to take that opportunity and do a good job

provide useful feedback so some things to keep in mind

pretty much all top-tier journals will require a statistical review of a

manuscript and many other lower-tier journals also require this in fact some

journals also require what’s called a quantitative author so so on with a

methods background there’s different criteria for this sometimes they require

one of the authors to have an mph or a PhD in the field but at the same time

the statistical reviewers may not be statisticians so there’s still some

seats in the front if you want to come right so

again not all statistical reviewers or statisticians will talk about that later

in the talk and the statistical reviewer often works in the field but is not

necessarily an MD so I’m not an MD I don’t often know all the terminology so

you should make sure that your paper is readable by a general audience so have

someone who’s not working on the project review the paper to make sure that it’s

accessible right so the advice I’m going to talk about stay in the context of

writing papers is also applicable to grant writing and then one final point

second to final point is that the devil is really in the details I’ve found

statisticians to be among the most detail-oriented people almost to a fault

so if there’s a mistake in your paper we’re probably gonna find it right so

pay attention as closely as possible to the details and then the last thing and

I take this really seriously you know as a statistician I feel like almost like

an ethical gatekeeper right so are the interpretations in line with the results

in this study design are the data being analyzed appropriately

are they acknowledging what limitations there are to their their study and I

don’t want a paper to go out into the world unless I’m sure that it’s

ethically sound um so this is a huge responsibility that I take on so we’ll

start first with the abstract so the abstract is the most important part of

the manuscript why anyone want to shout out an answer in reality what are people

gonna read in your manuscript probably just the abstract right so this is

really where I’m gonna spend a lot of attention as a reviewer to make sure

that what’s presented there accurately portrays the research that was done and

tells the story of the paper so keep it simple and whether it’s structured or

unstructured it should have the same main parts of the manuscript so what

should I be able to identify when I read the abstract a statistical reviewer well

I want to know what’s the background what gap in the literature is this study

filling what are the study aims what are the study design what is my main

exposure very what is my main outcome variable how are

they measured and what is my population of interest who is my study sample there

should also be key findings for each aim so I realize you have a limited amount

of space so this is difficult to do but ideally you should have findings for

each of your aims and then lastly something about the impact of the

findings so why should I care how is this gonna change practice and to be

honest I don’t really care so much about the statistical details in the abstract

right I can read the method section for that later I’m much more interested in

these other issues as the reviewer alright so don’t try and fit in you know

I said a multi-level model blah blah blah blah blah

alright just say something simple about how you analyze the data and just so you

know we will be posting these slides so you don’t need to take notes right now

if you don’t want to make questions on the abstract alright so again most

important part of the paper you really want to tell your story there so

introduction so these are some of the key questions that I asked myself as I

read an introduction what do we already know about this problem is the problem

clearly stated I can’t tell you how many times I’ve reviewed papers and I’m not

even sure what they’re trying to do right so if that’s the case it doesn’t

matter what your methods look like later if I don’t know what question you’re

trying to answer I can evaluate the rest of the paper

so what gap is the study filling in the literature why is this important why

should I care and then what are your study aims and aren’t they congruent

with the problem that you just built up so be as explicit as possible ideally

state a hypothesis right so not all research is hypothesis driven in that

case you should state that this is exploratory analyses and then one last

sort of reminder is that make sure that the aims that you list in your

introduction match the results you’re going to present later so often I’ll

read the aims move to the next section get to the results and I find things in

there that were not stated in the introduction so you want to have

congruence between the two so to get a little bit more technical when I’m

reading your introduction it should be clear

me what X, Y and P are. This is gonna be the extent of my algebra here so X is my

exposure variable Y is my outcome and P is my population of interest these are

the three key pieces of information that I need to know to move forward with the

paper and again I should also have some idea of the study design so you don’t

have to provide all the details but I should know is this a coke or is this an

RCT is it cross-sectional because that’s gonna frame the rest of the review

process for me right so you’re gonna have plenty of space later to describe

the design but include one or two sentences on the design in the

introduction to frame the rest of the paper questions about introduction we’re

also gonna have time at the end for questions so if you want to save them

for them that’s fine as well right so tell me what the story is why are we

researching this particular topic and then give me some more details what is X

what is Y what is P and how are we gonna examine those things so for the methods

section we’re gonna walk through the several different parts so the first

part is the study sample so I need to know who you included in your study who

did you exclude and are those reasonable inclusion and exclusion criteria you

should also tell me when you conducted your study so when did you collect your

data what is your sample size this seems obvious but it’s often forgotten so how

many people did you include in your study even if it’s in the table later

put it up front so people know what they’re working with you should also

include some sort of study flow chart so for a clinical trial there’s very

standard on reporting guidelines called the consort so I have a picture there of

what a typical concert it looks like you can go to their website to download the

template but I also wanted to say that the concert is a lot more than just a

diagram so there’s a lot of reporting criteria that you need to meet with an

RCT that websites a really great resource for that

and then this equator network is another great resource for other study designs

so they’ve tried to take what’s presented in the concert for RCTs

apply it to other study designs so if you’ve done a cohort case-control all

sorts of designs they try to give reporting criteria there for them as

well and when you’re working through this

diagram make sure all the numbers add up again seems like common sense that often

it’s not the case that that happens so I will go through and count up the people

in time you know the ends don’t match and then the last part related to study

sample is you need to tell me something about this this group of people so we

want to always have that table one with study demographics um who are the

participants in this study in the study design section you should also say

something about IRB approval and whether or not you had approval or if you had a

waiver you have to say something about that pretty much all of the journals I

review for required me as a reviewer to check a box saying that I checked for

that piece of information in the paper and if it’s not there your papers not

going to get accepted if it’s an RCT we also have to make sure that it’s

registered in clinical trials comm or some equivalent actually that should be

clinical trials.gov but basically your RCT needs to be registered so now we’re

going to come back to X Y and P so again what is your exposure what is your

outcome what is your population of interest so

in terms of your exposure in your outcome things that I want to know is

how are you measuring that how often are you measuring it what types of variables

are they are they binary are they continuous what is your primary endpoint

so maybe you’ve collected multiple measures over time but one particular

assessment is of interest you need to provide all of that information to the

reader and if you’re doing some sort of longitudinal study with follow-up you

need to tell us about Vanessa follow-up so how many people dropped out why did

they drop out and how that might impact your results these are the sort of

things that I’m starting to think about when I read the study design section so as a reviewer my main goal when

reading the design is to answer the following question so is the design

appropriate to answer the questions they are interested in related to the design

what biases might be introduced by using such a design so I’m gonna be thinking

about this as I read the paper and hopefully when I get to your limitation

section at the end of the paper you’re gonna say something about the study

design and how that might limit the generalizability or the interpretation

of your findings I’m also looking for something about sufficient number of

patients and/or events to answer the study questions all right so I’m

thinking about power and sample size so if necessary and depending on the

journal and the type of study you’re doing you might include your power and

sample size calculations in the manuscript and if you’re gonna do this

you need to give me all of the information I need to replicate the

results and I have in fact done this myself whenever I review papers

especially for clinical trials I’ll plug the numbers into my software and make

sure I get the same numbers as you and one sort of big warning make sure you

don’t confuse the total sample size with the per arm sample size so a lot of

software will compute the sample size for you and it’ll tell you you need 50

people but really what it means is that you need 50 controls and 50 treatment

people this has happened in real life I reviewed a paper they had the protocol

everything was in line but they misinterpreted their power and sample

size so they had half the amount of people they needed and they had enough

finding so it was really hard to interpret that paper given that mistake

so they had carried out all of this research spent all this money and now

we’re not sure what to do with them so please be careful make sure you double

and triple check your software if you’re using it so any questions about study

design and how to present that [waiting for questions in the audience] yes so if you check out this equator

network they tried to basic to come up with alternatives to the concert for

those other types of study designs so you can go in there and look for case

control studies what should you be including so they’ll give you more

details on that but you’re right it might not be exactly the same as the

concert diagram but there is an equivalent if you’re interested in

looking at it and not all journals require you to do this that I do find

it’s a useful resource as you’re preparing your paper to think about what

sorts of things you need to end up in the paper even if it’s not required any

other questions yep sure so let’s say you’re you’re conducting a study you’re

following people up for a year it would be useful to them for me as a reviewer

to know well how many people let’s say made it to three months how many people

made it to six months nine months and a year so if you did your study for a year

and you only had 10% of your sample left at the end of the year is that really

interesting and relevant so that’s what I mean by that is just to provide some

details on when are people dropping out and at what rate and of course you’re

gonna have to now deal with that missing data later you could also report like

you’re doing a survival study so median follow-up time something like that just

to give a sense of how long people are remaining in the study great questions

so the statistical methods section is probably the section I spend the most

time reading and the goal when you’re writing this section is that another

person should be able to read it and replicate your results I realize that

you have a limited amount of space your manuscript and I’ll give a

suggestion for how to handle this at the end of the talk but specifically what am

I looking for so I’m looking for methods that are cited for each objective so if

you have three aims what are the analyses you’re going to use for aim one

into a name three what tests are you gonna use and then how are you gonna

handle the missing data all right so missing data is a reality

of doing research you always try to minimize missing data but even given the

best methods to sort of ensure complete data we’re always gonna have things that

come up right so again report how much missing data there is if you’re gonna do

a complete case analysis so what that means is you’re gonna only include

people who actually gave you data or completed the trial you need to think

about what are the potential biases and mention them in your limitations are you

gonna use imputation methods right so there’s a whole sort of universe of

imputation methods what’s appropriate for your study and I just wanted to warn

against last observation carried forward so in our CTS a lot of people tend to

use this last observation carried forward method which means if someone

drops out you take the last available measurement that you have on that and

carry it forward to their endpoint there’s a lot of bias associated with

doing that there are much better methods out there to handle missing data so do

not use last observation carried forward and the other thing I will say relating

to missing data is don’t pretend like you didn’t have missing data in your

paper that’s a huge warning sign to me that you had something major go wrong

with your study it’s if you just completely sort of sidestep the issue as

a reviewer I’m gonna be like wait what happened why are they not mentioning

this so you have to be upfront and have to be explicit and be as detailed as

possible and again you may not necessarily be able to do some of these

things to imputation methods might be complex work with a statistician who can

help you so sort of the big question is do the methods match the type of outcome

data all right so this is not meant to be an exhaustive list but just to give

you a sense of some of the analyses I’m looking for depending on the type of

outcome you have right so your outcome variable is going to drive the analysis

so if you have a continuous outcome so something like

blood pressure or or weight and you’re comparing two groups right so you want

to compare Group A – Group B you could do something like a t-test if you’re

comparing three groups so Group A – Group B – Group C you want to use

something like an ANOVA you can also use a linear regression model in this case

and I have this under the column of unadjusted analyses because all we’re

interested in doing right now is comparing Group A – Group B if we want

to compare Group A – Group B adjusting for some other information so adjusting

for age or gender or whatever then we have to move towards linear regression I

just wanted to point out that we can use linear regression for both unadjusted

and adjusted analyses a lot of people think you have to do a t-test or an

ANOVA and then move to linear regression later you can use linear regression for

both sets of analyses so Bruno came with what I mean by unadjusted versus

adjusted we need any clarification all right so again it’s our outcome variable

that’s driving the analysis so if we have a binary outcome so death or no

death infection no infection and we’re comparing two groups there’s a whole

host of methods we can use the most common or chi-square test Fisher’s exact

test and logistic regression and again we can use logistic regression for those

unadjusted analyses and the adjusted analyses and then the last most common

type of outcome is time to event data so time to heart attack time to recovery

time to reassure to remission these are all time to event outcomes we want to

use survival methods so we have Kaplan Meier which gets us our survival curves

that’s a nonparametric estimate of survival so it gives us those figures

you’re probably very familiar with we’re gonna see one in a couple of slides but

if we want to compare two groups or two or more groups on those Kaplan-Meier

curves we have to use a log ring test we can also fit a Cox proportional hazards

model both in the unadjusted and the adjusted case and there are lots of

other survival methods out there these are just the most common so this is

again not exhausted list these are the most

traditional methods and there’s also a lot of nonparametric methods that are

not listed here that might be of use to you

all right so for me when I’m reading the paper I really need to know what the

outcome is in order to assess whether the appropriate method was used any

questions on this sort of table or the analysis technique so I’m gonna spend

the most time talking about the common mistakes in the methods section so the

first is when conducting an RCT for the most part its standard to carry out

what’s called an intent to treat analysis which means you’re gonna

randomize effort you’re gonna analyze everybody that was randomized whether or

not they completed this study whether or not they actually completed treatment so

if they’re randomized to group a you’re gonna analyze them in group a and if

they’re randomized to group B you’re gonna analyze them in Group B missing

data is a huge issue if you want to do an intent to treat analysis so when I’m

reviewing an RCT I want to make sure that the authors actually carried out an

RCT or an intent to treat analysis you can

carry out other types of analyses as secondary analyses so if you want to do

a per protocol analysis that’s fine but I’m expecting that to be a secondary

analysis not the primary analysis another really common mistake is

analyses that are used in the results are presented in the results are not

mentioned in the methods section so a really common example is the log-rank

test so people write in the methods that they’re going to use Kaplan-Meier curves

and then they report about log-rank test and the results they don’t mention

the log-rank test in the actual methods section all right so if you’re going

through your results section make sure everything you report there shows up in

your methods at least somewhere I think this is next one is at the bane of many

statisticians existence so there’s this word multi-variable versus multivariate

so what do we mean by multi variable and what do we mean by multivariate so multi

variable means we have lots of predictors in our model

so we’re adjusting for covariance when we write multivariate it means we have

many outcomes that were modeling at the same time so multivariable as many X’s

multivariate is many Y’s and more often than not people are fitting

multivariable models not multivariate models so make sure you’re not sort of

interchanging those two they mean they’re those two things they mean

different things to statisticians so when in doubt I would say go with multi

variable related to model building you should be explicit on how you carried

out your model building how did you select what variables were going to be

in the model did you care use some sort of selection procedure did you use

clinical judgment all of that is fine but you need to be explicit about it in

the manuscript and you should also clearly see what variables are in your

model don’t just say we adjusted for common covariance give the details questions on those first few common

mistakes so as long as they haven’t been randomized yet it’s okay so if they if

you enroll them in the study and the statistician hasn’t randomized them yet

and they drop out that’s okay because they’re not randomized as soon as you

press that button and you assign someone to a treatment group you need to keep

them in that treatment group for the duration of the study they can drop out

they could stop taking the medication they can die and how do you handle that

well that’s a difficult issue and you need to work with a statistician to

figure that out but if it’s before randomization it’s okay no so it’s

really from randomization forward that’s where the clock starts and in terms of

the difference between the ITT or the intention-to-treat in the protocol the

per protocol one example would including people who actually take the

full dose of treatment and excluding people who either drop out or switch

arms or something like that and those are useful and they have practical

implications but usually the primary analysis should be intent to treat there

are exceptions to that rule but um where most generally speaking ITT is the way

to go so I’m hesitant to say yes personally I would prefer that the

process is driven by your knowledge of the problem so I think a good

distinction is whether this is hypothesis generated work or is this

exploratory work if it’s exploratory work and let the data choose the model

for you but even then there’s some caveats so you probably don’t want to

choose the model and fit the model on the same data so this is idea of you

know prediction and validation so having training data and then testing data but

if you’re doing hypothesis-driven research hopefully you’ve already sort

of worked out some of those details in previous studies or you have pilot data

that informs the research process so if it’s hypothesis driven your your message

to sort of be specified ahead of time or you should be going into the paper

already knowing how you’re gonna analyze the data if its exploratory then you can

sort of choose you know several different models see which one does a

good job but then you have to be careful when you interpret it right don’t over

interpret those results then so you’re a priori analysis plan could say that

you’re gonna check for a quadratic effect of whatever and then if that’s

not significant work so you can write that all up in your plan as long as you

by all the steps you’re gonna take up hurry before you sort of start digging

into your data I think it’s okay but again if it’s hypothesis-driven

hopefully you have some previous research that shows you you know the

shape of that thing is not linear it’s some sort of strange function of time or

something whatever the example is but again even in that exploratory world we

want to be careful about sort of using our data too much and overfitting our

data that’s always something we want to be careful about are there questions on

this side of the room sorry you guys are in my blind spot so great questions so

[waiting for questions in the audience] another really common mistake is that

information is not given on how things are modeled so this might be obvious to

you when you’re doing the analysis but one common example is time so there’s

lots of ways to treat time and by time I mean historical time so if your study is

from 2002 to 2010 how are you taking time into account in

your modeling strategy are you treating it linearly are you categorizing it are

you dichotomizing it there are lots of different ways there’s an infinite

number of ways that you can include time in your model so you have to be explicit

on how you included it and potentially why you included it that way another

common mistake related to variables and how they’re modeled is categorical

variables so when you’re modeling a categorical variable you need to choose

a reference group right so if you have let’s say treatment a B and C maybe you

choose treatment a is the reference group it doesn’t matter from a modeling

perspective but in terms of interpreting the results that the model produces it’s

really important to know what the reference group is I can’t interpret the

results if you don’t tell me that group a was the reference group and related to

this if you’re choosing Group A is your reference group then you might give

results for B versus a and C versus a right that’s sufficient information for

me then so you give me the two comparisons when you have three levels another really common mistake is that

often we have data that are matched repeated or correlated so we have

before/after data we have data collected over time so let’s say you know every

month we collect data on the same we can’t analyze this data using those

traditional methods that I presented in that table so all of those methods so

linear regression logistic regression t-test they all assume that we have

independent observations this is not the case when we have longitudinal or mashed

or repeated data um so if that’s the case you have to use a whole different

group of methods um so there’s the paired t-test the MacRumors test those

are sort of the simplest but then there’s more complicated things like

conditional logistic regression mixed effects modeling GE so definitely work

with the statistician if you have repeated measures data don’t just use

sort of the off-the-shelf methods that you might have learned if you took an

intro to bio stats class those are not going to be sufficient in that case and

then another really really common problem is that multiple comparisons is

not accounted for right so let’s say you have four groups and you want to compare

each of the four groups to each other every time you do that comparison you

risk making a type 1 error right so if you do that I don’t know how many

comparisons that is but it’s a lot of comparisons there’s lots of chances to

make a type 1 error so you definitely want to think about that and control

your type 1 error rate again it’s going to depend on whether you’re doing this

hypothesis driven research in that case you definitely want to control your type

1 error or if you’re maybe doing exploratory analysis maybe then you

don’t need to be so concerned about that but in general you should be aware of

multiple comparisons and have a plan to deal with it questions on these mistakes

[waiting for questions in the audience] right so these are all really difficult

questions don’t think that you need to tackle them on your own even the best

statisticians will have a tough time handling some of these things just be

aware that these are things that you have to worry about and that the

reviewer is going to be looking for so another really common issue is related

to sample size or having a small number of events so depending on what type of

analysis you’re doing sometimes the overall sample size is really important

and sometimes the number of events is important so I’ve given several

examples of this so if you’re you’re if you have a binary predictor and a Bryant

binary outcome you can make a 2×2 table you might want to use a chi-square test

that’s sort of the default method that people go to but if you have small

expected cell counts not going to get into what expected cell counts are but

basically if you have small counts in any of those cells you shouldn’t use a

chi-square test your chi-square test is based on some large sample assumptions

in that case you want to use something like Fisher’s exact test that’s

appropriate no matter what sort of sample sizes you have if you’re looking

at a continuous outcome that’s potentially skewed and your sample size

is small you might not be able to use your standard t-test anymore so you

might have to use a nonparametric method like the Wilcoxon rank-sum all right so

if you have small sample sizes you should be thinking about this and making

sure that you’re using appropriate methods and then the last point I want

to make about sample size so when you’re building models people are often tempted

to just throw variables in the model so if you collected it throw it in the

model the number of variables you can include in your model is actually

dependent on either the number of people in your study or the number of events

all right so if you’re fitting a linear regression there’s a rule of thumb that

you can have one variable in the model per ten people in your study just a rule

of thumb but that’s something to think about right so if you have a hundred

people in your sample you could include up to ten co-variants in your linear

regression model it gets a little bit more complicated for logistic regression

but it’s the same idea you’re going to be limited by your sample size so you

can’t put a hundred variables into your model so be careful have a plan for what

variables you’re going to include and why and again always be explicit about

that process so questions about the small sample size okay so the last sort

of common mistake that I wanted to mention is related to time to event data

so the first thing is you want to make sure you’re using survival analysis

methods so if you’re modeling time first heart attack time to readmission

any of those types of things that involve time to the event you want to

use a survival analysis technique alright so cup Kaplan-Meier is the most

common Cox proportional hazards those sorts of things related to how you write

up your methods section you need to be explicit about censoring so what is

censoring in your study how are people censored how are you gonna handle deaths

or drop outs all of that information needs to be given to the reader and then

another really important issue related to survival analysis is competing risk

so let’s say you’re modeling time to first heart attack and someone in your

study dies before they have a heart attack how do you handle that in your

ear methods alright so once they die they’re no longer at risk of getting a

heart attack what do you do about that can you use standard methods again these

are complicated questions you need to work with a statistician to help you

with this but just to be aware of these are the sorts of things you should be

looking out for questions alright so again there’s a whole field of

Statistics just dealing with competing risks so moving to the results section

right so the first thing I’m looking for is results for each of the objectives so

if you wrote in the introduction you have three aims I’m expecting to see

results for three aims in general do the numbers make up or do they make sense do

they add up so what do I mean by this alright so if you’re reporting a

confidence interval and a p-value is there agreement between the p-value

and the confidence interval right so if the confidence interval contains the

null value right so if we’re talking about an odds ratio if the confidence

interval contains 1 then your p-value should not be significant right there

are very rare cases where this might not be true but in general we expect for the

confidence interval and the p-value to align so if I see that sort of

discrepancy I know that something went wrong somewhere and again now that’s

raising red flags my brain about who did this analysis

were they careful I mean that’s not something you want this statistical

reviewer to be thinking about so do the results match those in the abstract and

the tables and figures again this is a sign of sort of sloppy work if you have

one set of results in the results section and then a different set in the

abstract or the tables and figures maybe you have one version you updated it and

then you forgot to update other parts all of this is giving me the sense that

maybe you weren’t very careful and then I start to wonder well were you not

careful with other parts of the analysis section I’m gonna go back and be even

more critical of my you know in the reviewing process so are you reporting

effect sizes and confidence intervals and not just p-values so if you’re

saying that there’s a significant difference between two groups don’t just

give me the p-value tell me what the mean is in Group one and the mean is in

group two and give me a confidence interval for the difference between two

groups a p-value is not very helpful to me other than it tells me you know your

p-values there and then 0.05 or less than 0.05 in terms of having sort of

impact you definitely have to have those effect sizes in your pavement in our

next seminar so a little advertisement is going to be just about p-values and

whether or not we should trust them and how should we report them and whether or

not they should be banned by journals so several journals have now banned key

values outright my personal opinion is that’s a little extreme but you

definitely want to include effect sizes and confidence intervals and this also

applies to Kaplan-Meier curves so what do I mean by that okay

so that doesn’t look so bad at there right so here we’re looking at time to

some events so lung cancer survival so we have time on the x-axis we have

survival on the y-axis the first thing I’d like to point out is that the axes

are clearly labeled which is good and they’re comparing two groups so they’re

comparing that purple group to the blue group so females to males often people

will just include the dark lines they don’t include those confidence bands for

their estimates right and why is why are the confidence bands important why would

I care about those a reviewer all right so what am I

looking at in terms of the confidence bands so do they overlap a lot do they

not overlap I mean I have my moderating pesky value there that’s not labeled but

I’m assuming it’s a log-rank test p-values so it’s telling me those two

groups are significantly different but what do you notice as we move further in

time what’s happening to our sample size how many people are still in our study

as we move further and further out there’s very few so this last estimate

at a thousand I’m not sure if it’s days or months or years because they didn’t

label their axes but we’ll get to that two people left to males and no females

so how much should I trust the end of that survival curve probably not very

much right and hopefully the confidence bands give you that sense right because

you can see how big the confidence bands get at the end right so always provide

those confidence bands that’s part one part two is to always include this table

here with a number at risk right so you need to tell the audience

well should they put a lot of weight at that end result so if there were 200

people less than maybe but with two people left I’m not gonna really pay

much attention to the end of the survival curve I’m much more interested

in the earlier part so again eat honest it’s fine that there’s only two people

left but you have to tell the reader that questions about the survival curves [waiting for questions in the audience] for sure and I think it’s fine to report

the point oh six but then prevent curve provide that confidence interval right

so you telling me that the confidence interval goes from let’s say 0 to 10

that might be a very wide interval that might be a very narrow interval

depending on the thing that we’re talking about

all right so piece of information that would be useful there is well what is

the standard deviation for this particular measure right so it’s a 10

unit change important or is it not important so maybe for cholesterol a 10

point change is not meaningful but on some other scale it is so the clinical

significance part I’m relying on you guys for I can’t do that as a

statistician but that’s a conversation that the statistician and the clinician

need to have together and it’s difficult right so you get your p-value of point O

6 you’re like it’s so close right but just be honest and disclose what you did

provide the key value get the effect sizes and in the discussion you should

say something about whether or not that’s clinically meaningful if it’s a

point O 6 p value and the effect size is really small then what does it matter

right but if it’s point O six and it’s a big effect size then maybe you need to

do future research to confirm that result it’s a it’s a hard balance but I

think the most important message that I could give you is just to be honest

don’t try to hide things from the reviewer again you don’t want to sort of

send up red flags because then they’re gonna go back and review everything that

much more carefully so I tried to make this as to sort of warn you of this as

much as possible because I see this all of the time right so don’t carry out

redundant analyses so what do I mean by this so I have a simple example here so

suppose you’re interested in the relationship between smoking status so

binary and heart disease so yes no binary first you do a chi-square test to

see if there’s an association between the two then in your next step you fit a

logistic model with heart disease as your outcome and smoking status as your

predictor and then you do a hypothesis test based on that logistic regression

model those two things are testing the exact same hypothesis so without

adjusting for any covariance is there an association between smoking

and heart disease generally the results are gonna agree there might be slight

discrepancies but you should state ahead of time which one is your primary

analysis but don’t report both as if there are two different findings again

that’s a huge red flag to me that says statistical author on the paper wasn’t

heavily involved right because a statistician would know that those two

things mean the same thing and therefore you shouldn’t include both so just be

careful that you’re not including redundant analyses it will raise a red

flag if you do so tables and figures so after the abstract probably the most

important why if people are gonna read anything other than your abstract what

are they gonna do they’re gonna probably look at the tables and figures all right

so you want to spend a lot of time on these and make sure that they’re perfect

so make sure that they’re labeled if you’re using abbreviations throughout

the manuscript don’t assume that someone has read the manuscript spell things out

on the tables and figures again don’t just give me p-values I want to see

effect sizes I want to see test statistics degrees of freedom it’s going

to depend on the journal but not just the p-value if appropriate report column

or a row percents people often interchange these and get them confused

make sure you’re reporting the right kind of percent and if you’re gonna have

plots make sure that the plot has a title make sure it has a y-axis and an

x-axis label make sure you include units right so we were just looking at that

Kaplan-Meier curve I didn’t know if time was days years months so make sure that

the reader knows what you’re talking about

even if they haven’t read your paper questions about tables and figures okay

[waiting for questions in the audience] so last section discussion so what am I

looking for here so are the conclusions in line with the data are you over

selling what you found are you being honest are the claims supported by the

results presented how are these new findings fitting into the literature do

they agree what other people found are these new findings what are the

limitations this is to me as the reviewer the most important paragraph in

the discussion so what are biases that you have to be worried about this might

be related to study design and then specifically related to statistical

yours there’s statistical issues talk about

things like low power how did you handle missing data I want to know how all of

those decisions might impact these results and avoid presenting new results

or data so if you have new results or data you’re putting them in the

discussion stop go back to the results section and update that first don’t put

new findings in the discussion section so sort of ran through all of the

sections of the manuscript I want to give you just some overarching guidance

in terms of responding to reviews so hopefully you submit your paper they

want a revision now you have to go point by point and address the reviewers

concerns I think it’s usually easy to identify the statistical reviewer is

that true and you guys usually tell when you read the reviews which one was the

statistician I would assume it’s pretty easy start by reviewing the criticisms

with your statistical collaborator hopefully you already have won it so

hopefully this is not the point where you’re going and finding a statistician

to work with you don’t have to make all of the changes that the reviewers

suggest if you are gonna go against their advice be respectful and try to

provide references for why you’re doing what you’re doing

the statistician isn’t always right I put here misinformed I was trying to be

polite right so again sometimes the statistician or the statistical reviewer

is not actually a statistician by training

there are clinician who has a lot of experience and methods

so they might not know all there is to know about statistics they might not be

up-to-date on what’s the most current method again it’s fine to disagree be

respectful and try and provide support for why you’re making those decisions

and then just sort of as an aside from my experience a good editor will get a

second opinion so if the reviewer and the authors are

going back and forth about what is the best method or the best approach they

will often call in a second statistician to review the paper and give their

opinion so I’ve been what I call a tiebreaker before so the editor

contacted me and said hey the reviewer and the author’s can’t seem to agree on

this I don’t know what to do with it can you please take a look and give me your

opinion so don’t worry if there’s this back and forth process as long as you

feel like you’re you’re grounded in research so some final thoughts just to

wrap up wrap up the talk ideally you should begin working with a statistician

before you even start your study so before you collect any data come talk to

one of us have your statistical collaborator review the manuscript – for

submissions their co-author on the paper you should be getting their approval any

way to submit it include your statistician in the revision process

don’t go at it alone and then in terms of space so I often I’m working with

clinicians have heard well we don’t have space to put all those details how do

you handle that right specifically related to the methods include an

appendix with all of the details and you could even include code in your appendix

which I recommend doing if possible so that someone can reproduce your results

reproducibility going forward is crucial so be as open as you can provide what

you can’t fit in the manuscript in an appendix good luck with your manuscripts

and thanks for listening and we have a few minutes for questions I think [waiting for questions in the audience] I think it depends on the statistician

we’re doing the paper I know it some statisticians for me I’m a little bit

more pragmatic about it I think it’s okay as long as you’re looking about

what is the effect is what is it for myself whether that trend is really

meaningful or not but some statisticians feel very strongly that you shouldn’t

even say that at all