NINR Big Data Boot Camp Part 4: Big Data in Nursing Research – Dr. Patti Brennan
- Articles, Blog

NINR Big Data Boot Camp Part 4: Big Data in Nursing Research – Dr. Patti Brennan


[music playing]>>Mary Engler: I’m really
pleased now to introduce our next speaker, Dr.
Patricia Brennan, who will be speaking on big
data in nursing research. Dr. Brennan is the Lillian
S. Moehlman-Bascom Professor in the School
of Nursing and College of Engineering at the University
of Wisconsin, Madison. She — good — she is also
the leader of the Living Environments Laboratory at
the Wisconsin Institutes for Discovery. She has been recognized for
her significant research with several awards and is a
fellow of both the American Academy of Nursing and
the American College of Medical Informatics. Dr. Brennan is also an
elected member of the Institute of Medicine and
the New York Academy of Medicine. And I just want to say
remember last year when we had everyone do the
sit-ups and all that? We aren’t doing
that this year.>>Patti Brennan: We’re
not going to do it?>>Mary Engler: No, no,
no, yeah, so –>>Patti Brennan: You sure?>>Mary Engler: Yea
[laughs].>>Patti Brennan:
You positive? I’ve got a new
one this year.>>Mary Engler:
Okay, let’s try it.>>Mary Engler:
Okay, please join me in a warm welcome for
Dr. Brennan.>>Patti Brennan:
Thanks very much. Thank you, Mary. [applause]>>Patti Brennan: So she
took away my thunder. She warned it. This is — my talks are
characterized by audience participation and this
is our first audience participation activity. Everybody, take
your right arm, touch your left shoulder. Take your left hand and just
gently pull on that arm. [laughter] Now come back down, do a
normal pose, take your left arm, touch your right
shoulder, and bring that left — the right hand up —
sorry — and just pull. Did I screw it up? And then rest. All right. Feel better? Okay, so this — I thank the
planning group particularly for asking me to come
back because this is a great opportunity. It’s nice to see some
friends out here. I appreciate that. I am standing between
you and lunch. [laughter] I am feeling the power. However, you are standing
between me and the airport. [laughter] So I just need to let you
know I am jetting out the door at 1:15 because, as
some of you know, tomorrow is my birthday. I love my birthday. [laughter and applause] Thank you. And my son is waiting to
take me out to dinner. And those of you who’ve
raised boys know that when your son says, “Mom I’ll
take you out to dinner for your birthday,”
you don’t go late. [laughter] You just go. Now lest you think I have
a mutant child from the Netherland, I wanted you to
know that what he said was, “Mom I want to take you out
for dinner on your birthday and I have a date the night
of your birthday, so could we go the night before?” [laughter] Did I need this? Did I need this? Could we have my
slides up in a minute? Any moment they’ll
be here, I’m sure. So while we’re getting ready
let me first of all greet the people who are listening
on the Web and I hope you did your conference
calisthenics also because they’re
very important. Oh. There you go, there’s me. I want to say that
I’m going to talk about three
things today. I’m going to talk about
some motivational aspects of big data. I’m going to talk about
some technological aspects. And you’ll see our talks
weave together very nicely in many ways and don’t weave
together in other ways, so you’re going to hear a
little bit of a different perspective from me. And then I’m going to
talk to you about some applications of data science
and Big Data in Nursing. I want to remind everyone
this is an audience participation activity so
you’re going to have a couple things to do. So don’t get too kind
of comfortable. You might have seen me
knitting up here earlier. I do that so I
don’t do my email. But your email can
wait for the hour. Let me begin though by
asking you to just take a minute and find one person
sitting within one chair of you that you don’t know and
introduce yourself because you’re going to be here all
week anyway, ladies and men, so you might as well find
somebody to say hello to. [talking simultaneously] Okay, stop. Doesn’t work. It used to work much
better than that. Okay ladies and men, oh I
can see it right now, this is going to be major
audience participation. I don’t have to do
much talking today. Thank you everyone. Some of you know I’m
the second eldest of 10 children. I’m now aunt to 37 nieces
and nephews and I will tell you our family dinners
are way noisier than you people are. [laughter] But if you ever want to
know what big data is, we just saw some
of it right there. We saw motion, we saw
sound, we saw unstructured communication, we saw
meaningful engagement. Someone might have been
using that as a benchmark to see is this going
to start well? Is it not going
to start well? Do I feel good? We are constantly
processing data. And you were just
a part of it. So we’re already done most
of our lesson right now. [laughter] But remember the amount of
data in the world doubles every 18 months
to two years. It doubles every 18
months to two years. You cannot remember
everything. You cannot store
everything. You cannot know
everything. And, in fact, as the data
doubles, as Eric said earlier, our understanding
of what that data meant, even six months ago,
actually is always changing. So the ideas that we had for
much of our careers that we could understand, we could
know, we could master, are now giving way to the
possibility that we could wonder, we could conjecture,
and we could have a direction, but not
necessarily a mastery. This really changes the game
plan a lot and you’ll hear me return to this theme over
and over in my talk today. We must make sure our
patients understand how the game has changed because the
confidence that they once placed in us in knowing, in
mastery, we now have to help them understand and
learn there can still be confidence in
wondering, in conjecture, and in reasoning. So we need to remember
patients need to know there is not an answer that will
be handed to them, open the envelope and you’ll
find the magic term. There will be journeys and
pathways, and our process toward care is really
quite different. Now as the amount of data
doubles every 18 months to two years we can feel
overwhelmed with it. It’s very easy to
be fatigued. And as Bonnie said a few
minutes ago, what we’re talking about today is in
spades a team sport. We must be able to rapidly
engage with others who know and can help us know
because we will never know all ourselves. As this data is doubling
somebody has to store it, somebody has to curate it,
somebody has to make it accessible to us, visualize
it, make it known. And the rules of what’s
accurate, what’s right, what’s the proper basis for
decision-making, are also shifting with this. So it’s an
exciting time. This is not meant to scare
anyone, but it’s really meant to say if you’re going
to invest your energy in something, invest it in
weeks like this where you learn about reasoning as
opposed to right answers. Now we’re going to talk
about the themes that you’ve been hearing already through
the morning: big data, the BD2K initiative, data
science, and the Precision Medicine Initiative. [coughs] Excuse me. It’s astounding to see how
aligned our comments are going to be. You’ll be amazed. You’d think we organized
this, but not really. We do have to think —
remember these mean lots of different things to lots
of different people. And I want you to keep
coming back to make sure you begin to develop your own
familiarity with it. What is big data? Lots of things. It’s like the elephant and
the six flying men. We all think we know exactly
what it is because we got a hold of the tail or the
side of the elephant or the trunk. We know what it is and those
other people, well they may not be wrong but
they’re sure not right. And in nursing
don’t we love that? I mean, don’t we
love being right? Get rid of it, it’s
not useful anymore. BD2K is an initiative. It’s actually been — it’s
taken a firm hold here at NIH — the big data to
knowledge initiative is now over two years old. There’s been some nursing
participation and there needs to be continued
nursing participation. I’m going to talk a little
bit about some of the aspects that I’ve been
involved with. But Eric presented it in a
way that I found very interesting today, which is
that when we first started the BD2K initiative and it
was first set out on the horizon, we all knew what it
was, we just had to come together and have a
conference about it to codify it. And now people are
backtracking a lot and saying, “Wait, wait, wait,
wait — wait a second.” First of all, we can’t know
what the big data to knowledge initiative is
until we know more about big data and how it
gets to knowledge. So we need to build
infrastructure. And part of that
infrastructure is building talent and building staff
and building engaged people. We need to also
build capacity. So before we start sticking
out the first RO1 to build — to do data science
studies, we have to actually build the data repositories
or build the algorithms to make the data there. So the initial enthusiasm of
two years ago is still there, it’s just hidden
underneath the realities that big data is
hard to work with. Data science — again, we
all know — every university in the room, I’m sure, every
university represented here now has a new major in data
science or a new program in data science, or maybe they
have two programs like we have at Wisconsin. We have actually seven
programs in data science, one in the business school,
one in engineering, one in computer science, and one in
population health studies, one in biostatistics and
informatics, one in statistics, and one floating
freely in the clinical investigation group. So everybody has the right
answer for data science. Again, we’re going to talk
today, we’re going to be hearing for the rest of the
week of a number of ideas of what is data science,
what constitutes it? The Precision Medicine
Initiative, well we know what that is because the
President told us, right? That makes it
so much easier. [laughter] But I got sort of peeved at
these people because it used to be personalized medicine. Like don’t you remember
personalized medicine? Like a week ago it was
personalized medicine? And where does precision
stuff come in? So I — we were — Sue and I
were at a meeting a couple weeks ago at Harvard with
Zack Kohane and some of our colleagues who know what
Precision Medicine means, like we don’t. And I said, “Okay, so where
did this come from because it didn’t used to be
personal — precision was personalized, and by the
way, that medicine term, we still don’t like
it very much.” [laughter] Oh, so now — and I have to
say I was chagrined to learn the report that Eric
referred to, the National Academy’s report on
Precision Medicine — which is already three-and-a-half
years old — actually lays out what Precision Medicine
is and why it’s precision as opposed to personalized; the
idea here being the choice was made systematically by
the groups of scholars who developed that document,
which then led to the President’s embracing of the
term, that we wanted to assure that through the
morass of data we found precise enough
answers for patients. So the idea was
aspirational. So I give them
a little slack. It’s not a bad term. I’m not thrilled with it,
but it’s not a bad term. Okay, made a lot of
progress, lots of fast progress, but we really
have a long way to go. And the reason why I hope
all of you are here today is because without the nursing
workforce, we can’t get the value of Precision Medicine
and data science into the hands of people. So we need to be here, be as
clinicians, investigators, teachers, or students, we
must be thinking about data and data science as another
set of tools that nursing has to understand the human
experience and to help people respond to the
challenges of growth and development of
illness and disease. All right. My goals today are to
convince you that big data approaches require nursing
and nursing requires big data approaches. And within about two weeks
or so Sue and I will have a new piece coming out the
“Journal of Nursing Science” that actually we hope lays
out this argument very clearly and gives you some
references to citations. The second one is the
most important thing. Questions get us to data and
data gets us to answers, sort of. And as Bonnie stressed,
analytics matter, but questions are the most
important thing that we need to bring to the party here. There’s plenty of data. There’s always going
to be plenty of data. But questions, why do you
want to know, why do you want to know that, who’s it
going to matter, how’s it going to build
our knowledge. That’s our job. And success comes from
leveraging investments and creating scalable
knowledge-building teams. Okay. What makes data big? So everyone has their
own view of data. This is the one I stole. First, volume. You’ve been hearing a lot of
— big data is lots and lots and lots of data. It’s lots and lots of data. Our group says, yes, big
data is — sorry — lots of data, the volume of data
that there is there is an important characteristic
of big data. But we actually have a bit
of a caveat about what constitutes lots of volume
and lots of volume is something that doesn’t fit
on any single computer. So when we’re trying to
process some of the work I’ll you about in a few
minutes, we need a cluster of computers because all the
data that we have cannot fit in a memory — cannot
fit into one storage area at once. Velocity — you saw the fire
hydrant that Eric — in Eric’s image, the
velocity, the data’s coming at us so quickly. Think of the number of
heartbeats an infant has in the ICU. That’s the velocity that’s
coming at you so fast. What do you do with it? It’s even — by the time
you’ve captured one the next one is there, or maybe three
observations down the line. But it might be important to
capture every one of those so the velocity is a lot. There’s a great
variety of data. And Bonnie and Connie and
their group are arguing strongly for starting
off with principled terminologies, which
is really critical. But the fact of the matter
is it’s never going to happen fast enough
for what we need. So we need to think about
the variety of data: image data, genomic data,
motion data, sound data, engagement, all the data
that comes out of groups like the sound in this
room when we were all introducing ourselves. People talk about veracity
in a couple of different ways and I want to make sure
that this point comes through really
clear to you. Data lie. Data lie. [laughter] All right? Maybe they lie because
they’re imprecisely measured. Maybe they lie because we
thought they were this and they were actually that. Heartbeat or stress, what
does exactly that mean? Veracity, the truthfulness
of data, is a characteristic in part of its use,
not its source. We cannot be sure that every
data element coming at us, say heartbeat, is coming
from the same highly-precise sensing system that’s
actually has been calibrated and isn’t influenced by the
temperature of the air or anything else in the room. So we have to wonder even if
the data points we get are actually accurate. So as you use data in all
sorts of different ways — and remember we’re talking
about using goo-gobs of data here — we’ve got to think
about remembering that it may not be precise. We might have a
margin of errors. We might have distributions
of estimates, not actual data elements. And then there’s a group of
people who want to argue that data is big only
if it has value. I don’t like those people. But there are a lot of
people who say we should pay attention to what matters. Well, of course, but
you know what, I have conversations with
colleagues to talk about the patients’ response to their
— the variability in their blood sugar as being
disease management. And I talk about it being
symptom management. And they tell me, “Oh
no, you’re wrong, that’s disease management. What a person does to comply
with their medical orders is disease management.” We talk about things
differently and the valuing is different. So I value a person’s
ability to understand and interpret their symptoms
and act on them. I don’t really think of
that as disease management. That’s what CMS pays for. That’s what a person does
every day in their lives. That’s what I value. So our valuing structure —
and Eric’s answer really good to my question about
pharma was right. It’s a societal conversation
and you should be having it every day: in your back
yard, at schools, in your letters to the newspaper,
constantly having the conversation to improve the
value of understanding the human experience and our
response to it to our society as a whole. But since really what we
talk a lot about big data is volume, let’s go and look
at some volume issues for a minute. So how big is big? How big is big? I can tell you if my
slides would change. [laughter] Okay, slides. There you go — 6.9
petabytes is the amount of data that’s at Kaiser
Permanente, 6.9 petabytes of data is how much is in their
health information system right now. The number that you see
below there is 7 point — 7 petabytes. That’s a lot of data and yet
that’s what’s in — that’s actually what was in their
information system last year, so you can imagine how
much more is in there this year since it’s doubled —
it’s 50 percent greater. The Library of Congress has
about now 8 petabytes of data, but it’s at rest. It’s only sitting in the
Library of Congress. Nobody’s trying to merge it
together, nobody’s trying to look at trends for it. So data can be huge. The Library of Congress data
is useful to think about because we often use that as
a model of clinical records. We’ll build a repository
where the data will sit. And yet you and I more than
ever the only value of an electronic health record is
when that data’s in motion, when it’s useful for
something, when it’s used for something. So we need to be thinking
not about how do we store, which in my generation of
informatics was the big issue — storage, we’re
going to run out of storage. We’re not going to run out
of storage anymore, but how do we make it moveable? If it’s too big to fit on
one computer and it can’t fit into the active
memory, how do we make it useful to people? I don’t have that answer, by
the way, but you have a homework assignment
to figure it out. [laughter] Facebook stores 300
petabytes of data. That’s 50 times the entire
Library of Congress. And they add a half a
petabyte a day — a half a petabyte a day, all right? So there’s a lot of
data out there and some of it’s about you. [laughter] So you have to think about
how does that lead to your health data? How does that lead to — I
don’t know, but these data numbers, these are massive
numbers, all right? We are not going to build
T-tests around them, I guarantee you. [laughter] We are not going to do that. So all that Connie was
saying about needing new algorithms, saying, “Boy do
we need this, we need this.” All right. So let’s take a little look. I love this movie. It’s an old movie. I hope you’ll bear
with my desire to show it to you because — [music playing] This is on the [inaudible].>>Male Speaker: The picnic
near the lakeside in Chicago is the start of a lazy
afternoon early one October. You begin with a scene one
meter wide, which we view from just one meter away. Now every 10 seconds we will
look from 10 times farther away and our field of view
will be 10 times wider. This square is
10 meters wide. And in 10 seconds the
next square will be 10 times as wide. Our picture will center on
the picnic-ers even after they’ve been lost to sight. One hundred meters wide,
the distance a man can run in 10 seconds, cars crowd the
highway, power boats lie at their docks, the
colorful bleachers or Soldier’s Field. This square is a kilometer
wide, 1,000 meters, the distance a racing car can
travel in 10 seconds. We see the great city
on the lakeshore. Ten to the fourth meters, 10
kilometers, the distance a supersonic airplane can
travel in 10 seconds. We see first the rounded end
of Lake Michigan, then the whole great lake. Ten to the fifth meters, the
distance an orbiting satellite covers in 10
seconds, long parades of clouds, the day’s weather in
the middle west. Ten to the sixth, a
one with six zeroes, a million meters. Soon the Earth will
show as a solid sphere. We are able to see the whole
Earth now, just over a minute along the journey. The Earth diminishes into
the distance but those background stars are so much
farther away that they do not yet appear to move. A line extends at the true
speed of light. In one second it
half-crosses the tilted orbit of the moon. Now we mark a small part of
the path in which the Earth moves about the sun. Now the orbital paths of the
neighbor planets, Venus, and Mars, then Mercury. Entering our field of view
is the glowing center of our solar system, the sun,
followed by the massive outer planets, swinging
wide in their big orbits. That odd orbit
belongs to Pluto. A fringe of a myriad of
comets too faint to see completes the solar system.>>Patti Brennan: We’re
getting up to about a petabyte now.>>Male Speaker: Ten
to the fourteenth. As the solar system shrinks
to one bright point in the distance, our sun is
plainly now only one among the stars. Looking back from here we
note four southern constellations, still much
as they appear from the far side of the Earth. This square is 10 to the
sixteenth meters, one light year, not you think out
to the next star. Our last 10-second step took
us 10 light years further. The next will be 100. Our perspective changes so
much in each step now that even the background stars
will appear to converge. At last we pass the bright
star Arcturus and some stars of the Dipper. Normal but quite unfamiliar
stars and clouds of gas surround us as we traverse
the Milky Way Galaxy. Giant steps carry us into
the outskirts of the galaxy. And as we pull away we begin
to see the great, flat spiral facing us. The time and path we chose
to leave Chicago has brought us out of the galaxy along a
course nearly perpendicular to its disk. The two little satellite
galaxies of our own are the clouds of Magellan. Ten to the
twenty-second power, a million light years. Groups of galaxies
bring a new level of structure to the scene. Glowing points are no longer
single stars, but whole galaxies of stars
seen as one. We pass the big Virgo
cluster of galaxies, among many others, 100 million
light years out. As we approach the limit
of our vision, we pause to start back home. This lonely scene, the
galaxies like dust is what most of space looks like. This emptiness is normal. The richness of our
own neighborhood is the exception. The trip back to the picnic
on the lakefront will be a sped-up version. We do see in the distance
the Earth’s surface by one power of 10 every 2 seconds. In each 2-seconds we will
appear to cover 90 percent, the remaining distance
back to Earth. Notice the alternation
between great activity and relative inactivity, a
ribbon that will continue all the way into our next
goal: a proton in the nucleus of a carbon atom
beneath the skin on the hand of a sleeping man
at the picnic. Ten to the ninth meters, ten
to the eighth, seven, six, five, four, three, two, one. We are back at our
starting point. We slow up at one meter, ten
to the zero power. Now we reduce the distance
to our final destination by 90 percent every 10 seconds,
each step much smaller than the one before. At ten to the minus two, one
one-hundredth of a meter, one centimeter, we approach
the surface of the hand. In a few seconds we’ll be
entering the skin, crossing layer after layer from the
outermost dead cells into a tiny blood vessel within. Skin layers vanish in turn,
an outer layer of cells, fealty collagen, a capillary
containing red blood cells and a roughly lymphocyte. We enter the white cell. Among its vital organelles,
the porous wall of the cell nucleus appears. The nucleus within holds the
heredity of the man in the coiled coils of DNA. As we close in we come to
the double helix itself, a molecule like a long,
twisted ladder whose rungs of paired-basis spell out
twice in an alphabet of four letters the words of the
powerful genetic message. At the atomic scale the
interplay of form and motion becomes more visible. We focus on one commonplace
group of three hydrogen atoms bonded by electrical
forces to a carbon atom. Four electrons make
up the outer shell of the carbon itself. They appear in quantum
motion as a swarm of shimmering points. At ten to the minus ten
meters, one angstrom, we find ourselves right among
those outer electrons. Now we come upon the two
inner electrons held in a tighter swarm. As we draw toward the atoms
attracting center we enter upon a vast inner space. At last the carbon nucleus,
so massive and so small. This carbon nucleus is
made up of six protons and six neutrons. We are in the domain of
universal modules. There are protons and
neutrons in every nucleus, electrons in every atom,
atoms bonded into every molecule out to the
farthest galaxy. As a single proton fills our
scene, we reach the edge of present understanding. Are these some quirks of
intense interaction? Our journey has taken us
through 40 powers of 10. If now the field is one
unit, then when we saw many clusters of galaxies
together it was 10 to the fortieth or one
and 40 zeroes.>>Female Speaker: That’s
great music, isn’t it?>>Patti Brennan: [laughs]
That’s Elmer Bernstein’s music. So this movie began in the
’50s, people trying to understand volume, trying to
understand what did these massive numbers mean? And it’s essential for us
now to begin to understand this because what we have is
research going on in nursing, by nurses, for
patients that take advantage of millions of data points. In my own research we’re
trying to understand how the interiors of households
shape how people manage health information. So we take a household up on
the upper left-hand corner and we digitize it using
laser scanning and then replay it in a cave on the
lower right-hand corner — 950 million data points for
every single house once. And if we wanted to study
that house over a day or a week or a year or a person’s
life, we don’t have the capability right now to
capture the data. But what we do have is the
beginning of the questions: how does knowing the precise
location of objects in a house help us better
to help people manage their health information. And that is what we bring to
the conversation as nurses. We bring the questions. Now NIH thinks big data
comes only from three places. First of all, from projects
that are produced to — that produce important resources
for the community. So they’re intended to
build large datasets. And the undiagnosed disease
network that Kohane’s group is putting together is one
of those resources. Chris Chute’s group is
another group trying to look at how do we build massive
data repositories, or at least indexes of data. Additionally there are large
datasets useful for individual projects, which
might ultimately be broadly useful to the research
community, and that’s how — where I would put myself. Without hubris I would say
when we capture these houses we have a research utility
that could be used by architects, it could be used
by physical therapists, it could be used by people who
study color in motion. And so we’re going to be
distributing our datasets through open source on
the Internet when our research is done. So there are — it’s
important to think at the beginning of a project,
especially when you’re generating a lot of data and
our work, thank you very much, cost $2.5 million
— our work should be useful to other people. So you’ve got to start at
the beginning if you’re going to create a
big data resource. And finally, small datasets
whose value can be amplified by aggregating or
integrating them with other data, and you’ll be hearing
some examples of that also. So big data actually comes
from a lot of different places, but this
is not enough. NIH says the mission of the
BD2K initiative is to enable biomedical scientists to
capitalize more fully on the big data being generated by
those research communities. And I say, yes, but
that’s not enough. We need to think about
other places that big data come from. We need to think about what
our clinical care records are going to be providing
that complement and can be leveraged with
research-based data. It’s dirtier than
research-based data. We all know those electronic
health records are not quite there where we’d like them
to be yet, but they’re getting closer. But claims data, billing
data, health information exchange data, all come from
a resource that must be leveraged for knowledge,
just as Eric and Pat were saying, that NIH’s mission
is to create knowledge for health, we need to find
health back into knowledge. So we need to be working in
both directions. Biorepositories,
repositories of biological samples, is an
aspirational view. It’s starting to be
built in some places. You’ll hear from Sue Bakken’s
work this afternoon. The environment surrounding
people, the temperature in this room, the sound, the
aroma coming from the perfume of the person next
to you, are all aspects that are data sources that you’re
constantly influenced by. If you’ve got an allergy
you might be more influenced than others. Understanding how clutter
shapes health behaviors in the home is a really
important process that our group is doing. The environment around
people is becoming, if you will, front
and center stage. And then finally,
patient-defined and patient-generated data, data
that people know about themselves, whether it’s the
ability to fit in their skinny jeans or the ability
to pick up their child or to sing all the way through the
hymn in church on Sunday, they are data observations
that people make about themselves that have, when
you look at them over time, the same characteristics of
big data and certainly could be interjected with them. So I present to you the
recognition that nurses know we need more than these
small elements of data. And so for the rest of my
time we’re going to be talking about nursing big
data and the patient experience because really
that’s what here for. I want — now here’s
our second audience participation moment: what
questions would you answer if you had immediate
complete access to all of the data that you wanted to
have available to answer that question? So take a minute — you
don’t have to turn this in. You don’t have to tell your
neighbor, but take a minute and write it down. What question would you
answer if you could get all the data at once if you had
all those computer scientists that Bonnie
worked for, if you had those big clusters of
condra-processors [phonetic sp]? And I want you to think
about that question for a minute and I want you to
think about whether it is a question that is relevant to
your current research, that’s is, does it build on
what you’re already investing your time in? Or is it a speculation of
something new that you never envisioned you could
possibly answer, but now you might be seeing some
pathways towards it? One of the things that we
are — have an unlimited resource within nursing is
the ability to speculate about people, our
understanding of their experience and the things
that they need to make that experience the most
healthful possible. Data and big data and data
science should support our trajectory towards helping
people live the way they can live best. So nurses deal with a lot
of things about people. They deal with signs and
symptoms and that’s probably the most familiar under
our symptom science initiatives here. Observations of daily living
is something that came out of our group, the Project
Health Design Group, about what do people pay attention
to in their own lives. People also attend to images
and the first thing I’ll bet every one of you
thought about was an x-ray or an FMRI. And that’s a good thing and
I’m glad you’re thinking about that, but I
also want you think about family pictures. And I also want you to think
about movies. And I also want you to think
about what we know about images that tell us about
family bonding or a person’s ability to walk, which might
be a better data source than our ability to count steps,
all due respect to Fitbit. Biomarkers — we have — our
biomarkers are driven by science not by the
experience of patients, maybe could be: family
dynamics, patient experience, population
phenotypes, understanding how diversity is manifest
in a population. Nurses seek normal. That’s one of the traditions
we’ve always had, find what is normal for that person or
that group and go after it. We’re still hearing from our
Precision Medicine colleagues, “We’re looking
for the abnormal. We’re looking for the one
gene inside that little cell to figure out what’s wrong.” And that’s important, but we
also think about how do we build a system of health
around the system of normal? Biological specimens —
every day we deal with lots of different data types. And when we try to do that
we need to think about them within the learning health
system that we are now increasingly
becoming a part of. The learning health system
is enabled by big data and data science. We are able — Bonnie
talked a little bit about extracting things from the
clinical record to understand in the
moment what’s going on with a patient. I’m going to give you some
more examples of that in a few minutes. But the learning health
system is characterized by structured knowledge bases
and unstructured data, and really is — I forget if it
was Bonnie or Eric that said, “This is like a 1090
relation that is 10 percent of our data structured in a
way that somebody once thought was the right way to
structure but it might need to be structured
differently, and 90 percent of it is unstructured.” And most of what’s
unstructured isn’t even written down or codified
in our electronic health records. So think about the sigh that
a patient gives after being told about a diagnosis or
the ability to move more freely after surgery. Those are things that
somehow never get into the clinical record. So we need to be — we have
the ability, we have increasing technological
support for capturing data types that we never
thought of before. And let me just tell you,
you don’t want the technologist to decide what
data nursing’s going to use for their practice. You don’t want
them to do it. They’re nice boys, but you
don’t want — you do not want someone outside of our
discipline to define our discipline. And so as we start thinking
about data and sensors, if one more person comes to me
and tells me they can get a bed sensor for a patient’s
weight, I’m going to tell them, “You know, if you’re a
great family, there’s four or five people in that bed
at least once a week, and what is your weight sensor
going to do when four or five people pile into mom’s
bed?” [laughter] I mean, really let’s
think about what matters to people. Anyway, in the learning
health system we want to support learning and
self-correction, so predictive models for
patient-centered treatment are important that
are generated by characteristics of
the encounter. Now there is some hope that
sensor-based data will give us unfettered experiences of
the patient as opposed to clinician-interpreted data. So when I write that the
patient is — what’s my favorite one — well the one
I hated was the slightly obese middle-aged woman. But there was another one — [laughter] — about well nurse, well
nurse, that’s the clinician’s assessment, if
I’m a well nurse, how the hell do you know that? We have to think about what
sensors are going to give us in terms of new ways of
knowing people without the biases of human perception. Now that’s going to require
a change in our processing of information because we
have the ability now to take data without somebody
saying, “Yes, I believe that data’s true and in fact
about that person.” That’s — Sue’s going to
talk about that later, too. I’m not going to talk about
it now. But remember we have sources
of data that come from encounters and we also from
those generate our prediction rules, but we can
also generate understanding about our treatment
outcomes. And increasingly we’re
moving towards — because of the tight cycle between the
experience of a patient, the desire to understand a
learning health system the ability to make
predictions about what that experience means. We’re going to become more
experimental in our approach to patient care, more short
cycle engagement that will ultimately lead us to
improve the health care delivery system, but feed
more data back into it. So, as we think about big
data, and nursing, and learning health systems, we
need to think of it in part as this system of
information flowing through, and around, and about the
patient and the caregivers, but we don’t have to think
about this all by ourselves. So don’t — if you’re like,
getting exhausted already, don’t worry. We should not. We should not waste nursing
time becoming data scientists, not all of us. Maybe you know, five or 10
percent of you, great, but we do not want every nurse
to now become a data scientist because then
we’ll lose what you know about nursing. So, we need to find
partnerships. We need to find them
together, and we also need to find — do you know, as
someone asked Bonnie, “How did you know who to go to?” We need to begin to learn
the language that helps us get the resources
that we need and we want from others. So, in your folder
you’ll see this diagram. It’s dense. It’s a good diagram. It was built by
a guy named Suame Shanga Sakarin
[phonetic sp]. I always have to practice
his name, and I want to take you through this, because
this will tell you what data scientists and big data
managers can do for you. So, if you begin on the
lower left-hand side, we have the fundamentals. And in that fundamentals
area there are things like basic mathematical data
modeling reporting statistics. Moving around now through
the blue area, we see statistics, and as Bonnie
mentioned, a lot of the things we knew we’ve already
known can be leveraged in the new data initiative
models. But and programming, which
is often outside the scope of nursing practice, but
something that is actually useful as we begin to
understand not how to program, but what
programming does for us. Data munging is probably the
most unfamiliar term to you, and this is a process
of cleaning data. And Bonnie said, “Eighty to
90 percent of your time goes into cleaning data.” I might have said 96 to 98
percent of your time goes into your cleaning data. Understanding where it comes
from, understanding how to deal with uncertainty built
into the data with managing missingness [phonetic sp],
the temporality of data, how you know the data — how do
you determine whether data is — element is incorrect
or just changed over time? So, data munging is a set of
tools that do that, and data ingestion because you cannot
bring every — if you’re going to work with big data
sets, remember petabytes? You can’t bring it all into
one processor at one time. So, you need to think about
how do we divide up data, to be able to analyze it. And Bonnie mentioned a part
under the PCOR initiative, which I’ll also talk with
you about, that is allowing analysis to be done in a
distributed fashion, and then bringing it together. Big data utilities, and
these are our sets of utilities that actually for
the most part allow us to understand, store, locate,
and maintain the privacy, the security of data, and in
the lower left hand side you’ll see things
referring to the Hadoop and the MapReduce. These are two utilities from
Apache that are developed specifically to manage large
data that is distributed over lots of
different computers. So, where some of you may
have grown up with creating your SPSS code
book, forget that. The data sets that are out
there are not ever going to fit into SPSS. So, we need to think
about how to manage them differently. Computer scientists have
done better work on this than we will do. There are basic tools
for data management. Spark and Storm are two
that you may run into. This chart will be most
useful to you as you interact with colleagues and
try to keep up with the words they’re using. And you can — every one of
these data elements has a place on the internet. You can go and learn more
information about what does these different
processes do. Machine learning, Bonnie
mentioned is one of the most important techniques
in data mining. Machine learning techniques
vary — range widely, and they are designed to
capitalize both on the machine processing ability
as well as the size of the data process. Visualization, you’ll head
a little bit about Sue’s work more
this afternoon. Visualization is really an
early part, and I just want to pause and have you think
about visualization is developing on two
trends right now. We see visualization of data
and visualize analytics. And they’re not exactly
the same thing. Visual analytics is the
interpretive and analytical use of graphics and images,
whereas visualization is the presentation for the
purposes of generating insight. And I know they sound kind
of slippery, but their communities are really
quite different. Text mining, we heard also
about being a type of natural language processing,
very helpful to understand how we go from a body of
text to a body of words and phrases into something
that is meaningful and structured. So, these sets of tools of
which there are over 100 listed on this screen, are
tools where there are specialists that can use them,
and interacting, exploring. What do they do? Why would I need them? Who’s billing them
at your university? You can begin to understand
how to build partnerships, and how to build the support
that you would need. For the most part though,
the big data that’s available to you could and
can support a wide range of study designs from
observational studies through survey research and
population health studies. I call your attention
specifically to the practical clinical trials,
which is getting a lot of press lately because it is
the application of clinical trial concepts in thinking
to data that is not specifically collected for
the clinical trial, often data that is pulled from
electronic health records. Practical clinical trials
use statistical methods to cope with or to manage the
lack of randomization that often comes with clinical
trials, and it’s designed, and it’s actually getting
rather quite widely embraced by groups to better
understand the phenomenon better leverage
clinical data. Now, is practical clinical
trials the same as big data? I would say actually, no. The data that comes out
of the clinical record is messy. So, that makes it a little
bit like big data. There’s lots of it. So, that makes it a little
bit like big data. It can be a very different
kinds of data, image data, as well as numerical —
okay, so that can be [unintelligible], but the
bottom line is almost all the practical clinical
trials occur in one setting where the data is
all in one computer. So, issues about location,
storage management, processing, and part, things
like that that don’t happen within practical
clinical trials. But you often hear them
talked about at the same time. The advantage of practical
clinical trials and why people lean it towards the
big data initiative is this. In the past all the data in
a clinical trial, all the data types had to be
pre-specified ahead of time, and so if it was important
for example to gather eye color, and your data model,
and your [unintelligible] questions, and it includes
eye color, that was in the never clinical trial. So, practical clinical trial
actually gives you the flexibility to add in post
hoc new data elements. All right, handling data. I’m going to go back to
this, because Bonnie mentioned it also. I want you to think about
this acronym OSEMN. Whenever you’re handling big
data, and we do get questions about, “Well, how
do we do a big data experiment,” or, “How do we
teach nursing students to do big data experiments?” I’m going to tell you two
things. You don’t do a big data
experiment by yourself and secondly, you don’t put it
in an undergraduate course. [laughter] But we want to talk with
them about the results, all right, and so they should
know how to read a paper that came from a big data
exploration. First of all, the data
obtained, where did it come from? Where was it stored? Where are the different
places it was located? Secondly, scrub. How was it cleaned up? What was the data munging
activities that went on to bring the data into
a common format? How are duplicates removed? How is a patient record
across many sites, which we all know is really hard to
put together? How’d the computer
know how to do it? Exploration, who did
the exploration? How early was the
exploration done? One of the challenges to
experiments done under data science is they lack the
precision of our traditional hypothesis generating
approaches, and so people say, “Well, you can’t really
believe them.” In fact, we cannot be so
meagerly in our thinking about how knowledge is
generated to say, “We’re going to discard anything that
wasn’t a clinical trial.” But at the same time, if
somebody explores the heck out of their data to the
point that they know exactly it’s probably a little
bit misleading. What type of modeling tools
were employed and often when we talk about modeling tools
in nursing, we think of statistical modeling, but
newer models that are coming out of machine learning,
operations research basing in science also provide us
with a great deal of flexibility of the models
that we use in simulation models in particular
could be helpful. Interpretation, and that’s
where we come in. The data says this, what is
the expert thing, and as the experts you are thinking
about and the question to the answer, we’re there to
help with the interpretation. I do differ a little bit
with Bonnie about wanting the data to talk to me. I actually — they chatter
too much in my ears. Big data is a mess and if
you don’t take a principled approach to exploring the
big data, if you don’t take a principled approach, you
can spend an awful lot of time in the wrong direction. So, you have to be careful
about that balancing your question as a guide or frame
the stage lighting for your study and exploring
the data. Two points about handling
big data that are different than we usually see in
particular in experimental studies that are — and
survey studies that nursing is most familiar with. The first is whether the
schema is implanted or imposed during the
writing process or the reading process. So in general, you all have
built a — you’ve all written a — or maybe not
all, but several of you have, and many of you who
will write an NIH proposal where you neatly lay
out your variables. And then you lay
out your measures. And then you lay out the
elements of the measure. And then you say how you’re
going to collect it. Well, forget that. That’s actually important. Traditional SQL query
languages require you to do that, and you still
will need it. Our regular research isn’t
going away, but in large data science studies the
schema is developed only as the data are read in. So, we don’t always know
ahead of time what the investigator or the person
collecting the data actually wanted to label, but we
know when we get it how it has to fit
into the model. So, there’s a different
point, and the Hadoop cluster is the one that has
the utilities to do this. There’s a different point in
when the data are defined. So, in many studies that
we’re used to, experimental and survey studies, we
define the data ahead of time. We explain it. In big data science studies
we tend to do it as it’s used. Secondly, and this is going
to make some of those experimentalist in the room
really nervous, data analytical workflow is
distributed and allows for interim results. That is that you may run on
analysis model multiple times over — as new data
comes in, so and it’s not like we talk about early
looks and peaks in the data that we do in
clinical trials. That’s not what happens in
data science approaches tend to be highly iterative. As more data comes in the
models are refined. The models are rerun. To get insights from the
data then is a slightly different process. Dashboards are not enough. You cannot put the
descriptives of a large data set up when you have a
petabyte of data. It just doesn’t
make any sense. I mean, it’s hard
to look at it. So, it exhausts our human
capacity, and I’ve talked to a couple of people already
today about the importance of human factors. And cognition, cognitive
science can become real important here. Machine learning in advanced
statistical algorithms are really quite helpful,
because they allow for clustering of information
and presentation in multiple ways, and so interactive
exploration of data. Now, Bonnie mentioned
natural language processing as a way of understanding or
mining text, and that’s true, but it can also be
used to generate stories from text, or generate
stories from observation. So, the natural language
generation is an important aspect here. Here we have visualization
and visual analytics coming back in, and data
driven action. So, when you think about
they approach under big data and data sciences often
taking us to a much more short cycle iterative
strategy, and so our data drives an action which in
turns drives an evaluation of the action, which in turn
looks for a correction. So, rather than getting
to the end point, we know the answer is. We test end points along the
way, which takes us back to our original idea, or my
original idea that I hope you will continue to enjoy
with me, which is the question is really
important, because otherwise you can take a random
walkthrough a big pile of data, and it’s not going to
be helpful if you — you can modify your question. You can come up with new
questions, but as you’re looking at large data and
big data experiments, you want to be remembering
that you’re starting off with the direction. So, we’ll talk a little bit
about this by looking at some of the analytical
tools, but I want to remember — want to remind
you rather, the question is important because the data
can tell you whether one’s approach or another
is better, correct, or what have you. But it won’t tell you what
those approaches are. So, you need to know from
the beginning why you’re asking that question, and
the analytics can’t keep up with the data fast enough,
that is the data are coming — remember Facebook, half a
petabyte a day? The data are going to be
changed before you get your analysis plan written. So, you need to be looking
at robust and nimble analysis plans. I want to give you three
case studies to tell you a little bit about the data
systems, and we’re going to use these case studies to
illustrate three concepts. First one, how do you scale
an infrastructure? You should by now be
realizing that however powerful that Mac on
your desk is, it’s not necessarily going to help
you through a big data analysis, if that’s
the kind of analysis you want to have. Look back at your
questions for a minute. Look back at the
question you wrote. Did you write a question
that reads something like, “Why is asthma rates higher
in one county than another?” Well, you might need data
from 30 different sources to do that, from the air
quality in the school, to how much exercise kids get,
to what was the ozone level on Tuesday afternoon at
4:00 in the middle of the summer that year. So, you might need different
kinds of data to answer that, and having an
infrastructure to get out that data is important. Secondly, I want to talk to
you about a study about can we predict you will get
Alzheimer’s disease, and what some nurse and
researchers are doing in that study. And finally to determine
which newborn status is changing. Bonnie brought up a study
earlier about the ability to predict change in status,
and this is another strategy, another approach
to doing that. So, let’s look at the
infrastructure, and I’m going to talk to you about
pSCANNER and through the PCOR initiatives. And this is to me the single
most exciting thing that has happened in the last five
years in big data, the recognition among some teams
particularly by Lucille Moshato [phonetic sp] from
our colleague from San Diego, that patients
partnerships is the core to building really good big
data systems. So, in Lucielle’s pSCANNER
approach, it’s a set of utilities to help with
patient outcomes, studies of patient outcomes. They have multiple sites,
and you’ll see in this box on the right hand side,
the VA, UCSF, UCSD. There are different sources
of patients and patient data, and the central
utility that pSCANNER holds in the middle is it holds
the resources to query and to analyzed data from a
number of different sources. So, large amounts of data
can be brought together. Standard and analytical
agreements, sorry, standards based and analytical data
management is important. That is all the peripheral
sites use the same data management process. So, one has to plan these. We don’t start on Tuesday
and say, “Let’s grab data from a lot of places.” There does need to be
planning, but one of the things that this group did
in particular was to add privacy preserving
analytics. So, they took the idea of
what big data and data science strategies could do,
which is to do analytics locally and share results
centrally as a way of never having to integrate patient
data over a number of places. And this allows for much
greater patient privacy. Now, certainly there are
questions that can’t be answered under this model. That’s not our
worry at the time. What questions can be
answered are useful here. Now, the PCOR and PCORnet is
another set of utilities. You I’m sure right now I
know are familiar with the Patient Centered Outcomes
Research Institute, but within that their
infrastructure built a network of networks, which
had its objective to create the data infrastructure, so
within these various networks the plans to have
shared governance practices harmonized data to a common
standard, and softer to manage data transfer are
being put into place. So, removing from the idea
that an individual investigator can do a whole
study alone, or an investigator and their team
to thinking about how do we step into these large
essentially [coughs] excuse me, national and
international networks of resources, so we don’t
have to build everything by ourselves. You heard Eric mention
earlier today the biggest problem they have in the
human genome research right now that’s going on is every
laboratory has their own utilities and they don’t —
well, this is an attempt to say let’s not repeat that in
the clinical area. All right, so if we’re going
to have these principled research — Distributed
Research Networks, there’s a couple of principles that
drive them. First of all, the data
holders retain control of the data and are responsible
for maintaining the privacy. So, it’s an institution
level, not a research level solution. Secondly, the
standardization to a common data model occurs at the
institution, the place where the data’s collected, and so
what Bonnie was talking about earlier, about the
need to have these nursing language standards, these
are becoming very important. But it’s a big barrier to
enter because actually there’s four hospitals in
the country right now that do it, or something like
that, not many. There are more than four,
and then the shared governances and agreements
are needed, and one creating governance agreements across
institutions. I know you who have tried to
do that know it’s really hard, because institutions
sometimes view their data as an institutional asset, and
so, trying to improve sharing doesn’t always make
it work. However, [coughs], excuse
me, three aspects of distributed computing do
help here. The Hadoop structure, if you
only remember three jargon things that I say today,
it’s these three things. First of all, Hadoop, and
it’s not haydoop, and it’s not hudoop. It’s Hadoop. [laughter] I had to be told that. So, I know that exactly. Hadoop is an Apache service
that is a file system. It basically identifies all
the different files that data restored, and how to
integrate them, and that a process of parallel
processing of these data. A RESTful API is a web based
application programming interface that adheres to
architectural constraints. Now, this is the important
part, stateless agnostic to data storage and forgetful. That is the RESTful API that
allows for the data integration and allows for
the data integration to disappear quickly. So, we don’t have the data
security problems that you have when you move data into
a common repository, and finally the Apache Spark is
a strategy to cluster both computing and data that
allows the computing and data to be chunked together
and loaded in small pieces into memory. So, when we had these
massive data files that can’t fit into the computer
memory, there is a utility that’s already been set up,
that says, “We’ll analyze this set first and
analyze this set, then analyze that set. And here’s how you
integrate them.” So, distributive
contributing tools can be really very helpful. Network-Based Analytics,
I’m just going to touch on this very briefly. These aren’t in your
handout, and they become useful because it helps you
begin to think about the difference between what
we’ve done traditionally in research, which we will
continue to do, but what new data science affords us. And one of the examples I’m
going to give you is the difference between a
Meta-analysis and a distributed and
analytical approach. On the left hand side,
Meta-analysis, which is somewhat familiar
to nursing. It tends to be
not iterative. That is we set up the test,
we do the query for the data, and the results from
the different sites are brought in to a common
place, and interpreted as a single whole study, whereas
at a distributed analysis a data analysis
process is set up. A data model is planned and
the data are analyzed at the various remote sites. And the results of
the analysis are brought together. So, the key difference
between what we’re familiar with with Meta-analysis —
which is bring all the data to me and let
me look at it. And distributed analysis
is the idea that in a distributed analysis the
same models applied in multiple sites. And what gets integrated is
the results, not the dataset, which has a certain
benefits about privacy, preservation, and
transit security, and things like that. The analytics and outcomes
research data questions differ in different
processes. This chart summarizes
several things. When you go back home and
someone tells you that they do big data and you don’t,
this is the chart that you use to say, “Oh, really? Tell me, what kind of
questions are you going after,” because often if
we’re looking for inference and prediction, we
cannot use big data science strategies. We have to use much more
traditional methods, but if we’re looking for patterns,
and predictions, and classifications as Bonnie
said, the big data strategies are approaching
and are helpful. The last two lines are
important to think about. The common analytic
platforms you’ll notice under the big data ones,
these are probably somewhat unfamiliar to you, Hadoop
and the Apache Mahout, the Spark Machine Learning. Those have parallels in our
parlens [phonetic sp], our experimental parlens to
R, SAS, and Stratus. So, there is a
size difference. There is a way they handle
data and distributed fashion, and then also the
developers get engaged. This chart just summarizes
different places where the algorithms that are
used can be found. So, sometimes as Bonnie
mentioned earlier, the very same algorithm can be used
[coughs], excuse me, in a number of different
big data studies. The one that I — I’m
actually seeing the greatest use come up this year, is
the support vector machine, the three up
from the bottom. But a number of these you’ll
notice that we have very few what appears to be what we
commonly think of as our standard tool in health
care, which is regression — linear regression. And there’s another 15
different methodologies that people are proposing. So, the set of methodologies
is much more robust and able to handle these different
distributed systems. So, let’s move to two
clinical examples, and then we’ll wrap up and
have some questions. Can we predict who will get
Alzheimer’s disease? Some colleagues in Wisconsin
are interested in figuring out if we can know at 35 if
you’re likely to get Alzheimer’s disease. Now, if you like me got
really nervous when Eric kept saying, “And we can
predict at birth this and that,” I’m thinking, “Who’s
going to work with that family for the next 20
years, while they worry if this kid’s going to develop
this horrible disease?” Or, “Who’s going to help the
family maintain a sense of wholeness?” And now, so we got a lot of
work out there for us, whether or not we ever run a
Hadoop cluster. So, Sterling Johnson wants
to predict who will get Alzheimer’s disease, and
they’re working on something called predictive
phenotyping. And what they do with this
is they create a score based on the neurofibrillary
tangles inside the neuron and the amyloid plaques
outside the neuron. So, they take lots of FMRI’s
in individuals over the period of time from 34 to
40, and they do a number of different imaging. They have over 100,000
different points per person that they can study, and
they’re trying to look at these clusters through
image analysis. And then they’re building a
data repository, so they can follow these people like a
Framingham approach over 30 years, to see who develops
dementia at earlier stages. Terrific project. Great mix of people from
image analysis, from neuroscience, from
geriatric care. Terrific group of people. There’s a nurse
on this team. This is what’s important. When someone asked of Bonnie
earlier, “How do you find out –” Keep
talking to people. Get on teams. Join teams. So, Elisa Torres, who’s on
our faculty in the School of Nursing at Wisconsin is
working with this team, and she has a very
interesting question. There’s some good evidence
that exercise alters your brain structure. So, you have more — sorry,
fewer white matter hyperintensities if you
don’t exercise, and if you do exercise you
have more of them. I hope I got that right,
Alyssa, and having fewer of them actually shows some
relationship with people who developed dementia in what
they call mid to late life like, 70 to 80 years old. So, what Alyssa is doing is
she’s taking Sterling’s work as a way to say, “Whose
exercise patterns physical activity should we monitor
for the next 30 years, and would that tell us who’s
likely to get that? And would that then
provide us with a pathway to intervention?” So, we don’t have to wait 30
years for the study to be done, to begin to identify
nursing questions, extract from these large data
studies information that is useful to nursing, and
allows us to explore this. Watch for her work. She’s really super. All right, let’s talk about
babies. We love babies. Okay, which baby
is going bad? Now, there was a very famous
study published in nursing around the late ’70’s that
talked about the expert judgment of NICU nurses and
how much better that was than any predictive
algorithm. So, it basically
said, “You haven’t experienced NICU nurse.” Nurse looks at the baby
and says, “Oh, that kids’ gone bad.” That nurse is [laughs],
but that nurse is probably right. I mean, there’s some
evidence that there’s this — we don’t know what it is. Is it the fact that the
baby’s twitching? Is it the color of the skin? God knows, but there is
a human judgment. We will still need that, but
maybe because we have better analytics we’re going to get
closer to being able to actually mimic that expert
nurse, because there’s only one in six hospitals. And we have a lot of the
hospitals around. So, the ECG provides 1,000
readings a second, 1,000 readings a second, 86.4
million readings a day to get heart rate. Where are you
putting that data? How many of you’ve ever
looked at the E HR in a NICU? You might have every 15
minute — maybe, and if you were like me, they’re in
green. That’s what we used to do. I guess they don’t do that
anymore with the computer, but all that data’s back
there and we don’t get it. We could look at this is
where practical clinical trials can come into play. There’s some evidence of
assessing wave from impedance, which is another
part of the ECG assessment. To do that — and it’s
helpful to assess breathing — adds another 5.4 million
data points a day. One baby, right? We’ve got one baby. One baby, one day in the
NICU, 90 million data points, and if that baby is
in the NICU the typical length of time, 90 days, well,
that’s a lot of data points. So, rather than, and what we
do right now is we use simplifying assumptions and
we sort of scan it. And we say, “It doesn’t look
too bad,” or, “This is all right,” but we need
to be thinking. We need to ask the
questions, when does knowing data at that level
of precision help us understand a person? These three studies —
these three options in the infrastructure building
predicting Alzheimer’s disease, and determining
what’s happening in the clinical area. Use data science in
very different ways. They don’t all use
it the same way. This is a clinical
application of an analytical process that requires a lot
of buildup of the analytics, but at the moment it’s
actually targeted to the care of one person. The Alzheimer’s prediction
project is actually looking at predictions of
phenotypes, but they want to get to classes of
phenotypes, and not for an individual person. And the infrastructure we
hope will support lots of studies. So, let me wrap up by
talking about nursing engagement in the BD2K and
the precision medicine initiatives here at NIH, and
then some things — my homework assignment for you. There were as I said three
workshops that were carried out over a period of time in
2013, that have led to the development of the work that
they — the calls for proposals that we’re
seeing right now. The first one that’s had the
most — oh my God. My numbers are wrong. Look at that. It’s probably a
translation error. The first one has gotten the
most traction. How do we enhance training? And there’s a lot
of interest. And most of the emphasis is
on improving the skill of research scientists to
understand and appreciate data science, not to
actually make them data scientists. There was a smaller
initiative to make people actual data scientists, and
we need to think in our field how many nurse data
scientists are essential, and how many more nurses
with expertise in data science and nursing research
supported by data science do we need. Enabling clinical —
enabling use of clinical data, their report from this
project is where a lot of the work on practical
clinical trials comes out. The framework for community
based standards hasn’t made the progress they had hoped,
because what has happened there is that communities,
research communities not the community that we think of
as nursing, research communities like their data
and their terminology. And they don’t
like to share. So, trying to harmonize
terminologies across a number of different research
traditions has been a significant challenge. There is a strong
informatics core of the BD2K initiative that focuses a
lot on the processes and frameworks for data
integration, and the use of metadata particularly for
metadata descriptions and standard ontologies. I can’t stress enough the
importance of knowing whether it’s at the point of
the initial declaration or at the point of use the
formal ways of expressing data is becoming
absolutely essential to knowledge building. The i2b2 community, the
integration from the bench to the bedside informatics
for integration from the bench to the bedside has
been helpful in doing — in providing utilities that
other groups can use. Judy Warren has a very nice
paper about how she worked with an i2b2 group to create
an ontology of a checklist to understand the kind of
tending problems that nurses deal with. Oops. So, beyond the volume, what
are we doing with big data? Well, there’s research. There’s innovation in areas
around vision recognition, particularly around
understanding facial expression and how machines
are beginning to understand facial expression. So, the possibility that we
might get clinical assistance if a patient
experiencing distress because a monitor detected
that their face is showing that distress could be
enormously helpful in prioritizing where we put
our nursing research. But questions have to be
done and asked and answered by nurses to do that. Improvement in complex
simulation and the internet of things, which is
everywhere now I understand, but understanding how we’re
going to leverage this for nursing, what would we do if
we had more data than we could possibly handle? Oh, we have it now. Well, what would we do if we
knew the questions to ask of the more data? That’s where we need to be. Computing power I think is
going to be continuing to increase, and our
intellectual investments in infrastructure and health
care are proving to be going in the right direction. So, what we need to be doing
now is asking questions. Within nursing we need to
have as I said earlier today, data scientists,
nurse data scientists are important, that we don’t
need a lot of them, but we do need some that can really
hold the concepts of nursing and the concepts of
data in their minds at the same time. More people like me, of
course, I want the world to look like me, more people
[unintelligible] nurse scientists with data
sophistication, and that can be each of you. That can be each of your
colleagues. We need to have a
recognition, just like we all know what a survey is
and what a T test is, we need to have some
understanding of what a Hadoop cluster is and why
would we use it. We can’t simply say, “I
don’t want to do that.” [laughter] And then, we need to work
with data intensive nurses in practice, because we’ll
be returning to practice like that in that ICU with
those NICU babies. We’ll be returning to
clinicians very intense data information, and we’re not
training our students nearly well enough to do that. My last request to you, is
to continue to think about how we can improve the
informatics core to compliment the statistical
training that people have. We really must have
informatics moved out of the E HR idea into thinking
about as Bonnie mentioned, formalizing the structures
of information, things that consider team scientists and
citizen data scientists, and if you’ll all be with us in
— I’m sorry, in Brazil next month, I’ll be talking about
patients as data scientists and citizen data scientists. And to think more in a
much more robust way about methodology. We need to stop teaching
courses in quantitative methods in nursing, and
quantitative methods in nursing, and start thinking
of a methodological framework that includes a
range of methodologies for the kinds of questions
nurses will answer. I thank you very much for
your time and your patience. I went too far. Goodness. Let me back up to where I
want to end and tell you that — forget it. It’s gone. [laughter] I now have Dr. Grady’s
stuff. [laughter] I wanted to introduce you to
my group and show you that this doesn’t all come from
me, but it comes from a lot of the people that I work
with. And my inspirational moment
for you is in 1962, Hildagarde Peplau quandered
whether automation would change nurses, or nursing,
or both. And I take a little bit of
license and say you need to spend this week thinking
about what knowledge is needed to ensure that nurses
and nursing change to leverage the power of big
data and precision medicine. Thanks very much. [applause]>>Mary Engler: Do we have
any questions?>>Patti Brennan: Well, this
is a brave woman because now she stands between you and
lunch. [laughter]>>Female Speaker: It’s
quick. Given that nurses prefer to
get information socially –>>Patti Brennan: Yes.>>Female Speaker: How do we
use social networks to share what we find with big data,
which is a big data question in itself, the social
networks of nurses.>>Patti Brennan: So, Suz’s
going to talk about that this afternoon. I’m not worried about that
now, I don’t — [laughter] — so your question has
three points, and we are a social discipline. So, that the kind of tools
that we need out of the big data initiative have to mat
to our social way of conveying and understanding
information. Second, we need to think of
ways to leverage social networks and social
networking tools to support that kind of interaction. So, we’re going to move —
sorry for any journal editors in the room. We’re moving pretty quickly
away from journal articles and pretty fast into blogs,
and understanding, and shared knowledge building. And the third piece is I
don’t really know how we’re going to capture data
emotion. I’m impressed with Jackie
Blatz’s [phonetic sp] work and Nancy Staggers [phonetic
sp] from Utah where they’ve been trying to figure out
once again at the handoff what are we conveying, and
half the time the words mean nothing. And it’s all about
the — “Oh, God. Get in there fast. This lady’s going down,” or
“Don’t worry about this kid, he really turned around. He’s fine.” And it’s not the words, it’s
the expression. So, what I’m thinking of is
body wearable sensors that extract emotional content
from spoken word might be what some of that
social — support that social interaction. Yes, hey. Hi Inton [phonetic sp].>>Female Speaker:
[inaudible] –>>Patti Brennan: Yes.>>Female Speaker: —
950 million –>>Patti Brennan: 950
million data points [unintelligible].>>Female Speaker: I was
wondering — I’m trying to conceptualize what kind of
data points [inaudible] –>>Patti Brennan: Okay, so
when we take pictures of a house, we take a light R
scanner and a camera sits on the floor. It spins 360 degrees at a
vertical angle and 310 degrees on the — I’m sorry,
310 degrees on the vertical and 360 on the horizontal. It puts out laser beams and
the data we get back is the time it takes a laser beam
to get out, hit an object, and come back. And we do that over maybe 10
different times in a house of four or five rooms. So then, we stitch all those
images together, and we can recreate pictures, but when
you open our dataset, talk about privacy, preserving. When I open the dataset on
your house is six numbers, in an array of six numbers
probably two million lines long, and so, it’s in the
processing. And it takes us about — it
takes us three days on a condor cluster of nine
machines to process one dataset. I don’t do this by the way. Our staff does it [laughs]. So anyway, so one of our
questions right now, any of you who are on the review
panel for my HRQ grant, take a look at it. We’re trying to figure out
which part of that needs to be in the medical record. If we had a full
reproduction of your house, and we could put it in your
clinical record, we might be able to actually figure out
if that bed for your mother is going to fit through the
door and can sit in the living room before the
durable medical goods people deliver it. Or we might be able to
understand hazards, or maybe even detect violence by
looking at the layout of a house. So, if we can do that, we
don’t want to put the whole 950 million data points in
the clinical record. We’d been working with EPIC
[phonetic sp] on this — they’ve been terrific. They don’t want the 950
million data points either, but we’re looking at how we
could take an extraction of it and put just a
screen scrape of that. But still allow for
exploration.>>Female Speaker:
Thank you.>>Patti Brennan: Thank you. Have a good lunch. [applause]>>Female Speaker:
Thank you so much. [applause] [music playing]

About Bill McCormick

Read All Posts By Bill McCormick

Leave a Reply

Your email address will not be published. Required fields are marked *