SANS DFIR WEBCAST – Network Forensics What Are Your Investigations Missing


– Hello everyone, and
welcome to network Forensics, What are your
investigations missing. I’m Jason Keeler with
the SANS Institute and I will be
moderating this webcast. Today’s featured
speaker is Philip Hagen with Lewes Technology
Consulting. Before I turn things
over to Philip, the Q&A portion will take place
at the end of the webcast. Please feel free to submit
your questions at any point by using the chat window. Right now I’d like to
introduce our featured speaker, Philip Hagen. – Hello everyone,
glad to be here, and good to see a couple
of good familiar names on the lists. I hope we’ve got a good
program put together for you. I’m just going to go ahead
and start sharing my screen. All right, looks like
that’s coming up well. All right, well as
mentioned earlier, this is network forensics, what are your
investigations missing. I’m currently coming to you
from a pretty cloudy day out here in southern Delaware. Hopefully it’s a little
bit better weather where you might be. Let me go over real
briefly what my goals are for this presentation. First of all, this is
going to be a brief primer on network forensics. We’re not going to be able
to get too deep in an hour but I think it’ll
definitely be enough to to try and get
you interested in and show where network
forensics is going. One of the areas that I
do like to focus on is where are the existing
sources of evidence that can help on a
network-based investigation, or any investigation that
involves the network, not just a network-based one. And where are those
pieces of information that you might be able to use. I have a couple of
real-world examples where networks forensics
helps complete a picture in real investigations,
and I hope that those will drive home a few
of the points I’ve got. And I really want to look ahead to the next three
to five years maybe, even if that long, to see how
I foresee network forensics becoming a more core
part of our practice. And of course I really just
want to get you interested in the topic, it’s
something that I work with very very frequently,
and I really do enjoy it, and hope that it
spurs enough interest to maybe have you learn
some more, maybe come see me in one of the upcoming
network forensics classes. Or if nothing else, ideally just improve
your day-to-day work that you’re doing in your work,
or your personal research. Just so that I’m not
completely a disembodied voice at the other end
of the internet, just a little
background on myself. I’m an Air Force Academy
graduate, class of 1998. I was a computer
science major there. I first got interested
in information security at that point. Really, had no
existing community, information security
was was still this thing that people talked about
in very hushed tones, and it was just emerging
into something of its own. After I graduated,
went into active duty, I was an Air Force
communications officer, worked at two assignments. I was in Beale Air Force
Base in Northern California, and then transferred
over to the Pentagon where I was responsible
for some practitioner-type information security
I would call it. We were on a very vulnerable
network at the time and had to field some
pretty unique requirements. So it was something
that really showed me the real-world application
of information security and how difficult it is to
balance user requirements versus reporting and
security requirements. In 2003, shifted out of the
Air Force, became a contractor. Worked with, actually
that’s why I recognize a bunch of the names
from the list of folks in the webcast today. And got to do some pretty
fascinating work there, including working with Rob Lee, with a couple other
folks in the room today. And eventually wound up
managing the computer forensic practice there, we
had about 85 folks, and a lot of the government
labs in the greater DC area, and some other locations
around the world. And really brought home
a lot of what I’d learned at that point and made
it a lot more real. It was a lot more
comprehensive practice than I had worked in previously. Since then I have
transferred off to become an
independent consultant, and primarily working
with computer forensic and information security
consulting at this point. A lot of the work is
federal law enforcement, and then also working a bit in
the commercial side as well. And I’ve since begun to
work with community fans. I do currently
teach Forensics 558, our network forensics course. And I’ve got one of those
coming up in community soon here in a couple of weeks. So maybe there’s a chance
we can get a few of you to come down to
Quantico Virginia, and I’ve got the dates coming
up later on when that is. All right, so, the obligatory
brief computer forensics history slide. Back originally of course
we saw dead-box forensics, find a system that was involved
in an investigation somehow or relevant to an investigation,
pull a hard drive, preserve the evidence,
work on working copies. And at that point
you produced a report and you moved on to the
next to the next phase. It was very difficult to
handle anything dynamic because the practice was
focused on data at rest. Well then we moved into
the memory forensics era, which is obviously
in full swing now, there’s so much
great work going, with the Month of Volatility
Plugins this month, and so much other research
and real-world products coming out to help
in that regard. But we finally had to adapt to
the fact that data may change and this was a
pretty radical thing for the community to
deal with at the time, because, well what do you mean that you can’t
collect the evidence without changing the evidence? When you run this
memory-dumping utility you’re going to have evidence
of the memory-dumping utility in the evidence. It’s this circular
logic that people really really had
a hard time with. That’s something that
we’ve just finally begun to get our brains around
collectively as a community. And be able to
incorporate live response and similar type of evidence
into our investigations. Well then we move into
what I would say is the current evolution
of this process, and we’re incorporating
network forensics. And network forensics
is also something that is very dynamic obviously, but it’s such that if you
don’t catch that packet as it’s crossing a wire or if you don’t catch that
piece of network evidence you’re going to
be in a situation where you don’t have
quote-unquote all the evidence. And that is really something
that we are struggling with. You can’t very well serve
a router with a subpoena or with a preservation
of evidence request and say please keep
all those packets that you passed last week
while I go get a warrant. That’s obviously
never going to happen, so we need to adapt our
practices to derive benefit from the network-based evidence. But at the same time we
also need to be realistic and understand that we’re not
going to get every single byte and be able to explain
why that happens and how that happens. Now obviously who knows what
the future is going to hold. Maybe it’s going to have
time-machine forensics, so if anybody here has taken
a forensic image of a DeLorean please let me know, I’d
be very interested to hear exactly how that went and
what you processed it with so we can work with Rob
and get that handled in the next version of the
set workstation I guess. Through all this,
the one constant has
been a need to adapt. We need to accommodate all
these new sources of evidence as they come into
widespread use, and seeing them out in the wild. And we need to incorporate
them into our approach because very seldom do we
have a situation where, hey we can do memory now so we don’t do
dead-box forensics, we don’t look at the
hard drive anymore. Oh, we can do network, now
we don’t have to do anything. That’s ludicrous. We’re never going to
do something like that, instead we want to use all
these new sources of evidence, all the new research
and capabilities to better complete our picture and provide a more
comprehensive understanding of an incident that we
may be investigating. And a really key example
that I love to use for this was a paper that Jesse
Kornblum put out in 2006, which is ancient history in
the computer forensic world, but it was called Using
Every Part of the Buffalo. And even in 2006,
try to think back to where memory forensics
was at the time. And memory forensics was very, we weren’t really talking
about it at conferences even. But even then
Jesse’s paper said, hey look if you’re using
memory forensics that’s great, but here are some things
that you’re not yet using, you’re not incorporating,
you’re overlooking. So we’re seeing this cycle
speed up and speed up, and that’s the point
that I want to make is that the only constant
we’re going to see is that things are going
to continue to change. And it’s probably going
to get even faster, so the better we are able
to adapt our processes to handle these new sources the better off we’ll be
in the long run of course. So just a real brief
look on what I see in the network
horizon right now, within the six-month
window, or current even. Primarily we’re seeing
the network used increasingly for core functions
of an operating system to run software. Software installation
and updates, hey that’s light old hat
right, we’ve been doing that since the WSUS server was a
thing back in the early 2000s, early to mid-2000s. What we’re seeing now is
this concept of, for example, slipstreamed updates. We are seeing Chrome,
which I think is the model that a lot of these software
vendors are going to move to, that just silently
in the background
downloads its updates, starts applying
patches when it can, and if it needs to
restart it just gives you the little warning and
says, OK, it’s time to go. We’re seeing that to
an extent in Windows, but, the Windows
operating system, but I definitely
foresee the Chrome model of this slipstreamed
background updating where you don’t even didn’t
even know it’s happening, that’s going to
become more common. And as far as installations
go, the the Mac App Store and all these various
application download centers that the platforms
are employing now is becoming pretty common. Also I think in backup, I
hope everybody is familiar or at least heard of
this Dropbox service, kind of a darling out of the
Y Combinator startup world in Silicon Valley. That type of service that’s
being able to let you synchronize your documents
across multiple platforms, multiple systems simultaneously, that is something I foresee
becoming integrated more into an operating system. We’re seeing backup as
well, where it’s off-siting your backups in near real-time. I use a backup service that, I don’t think I’ve looked
at it in two months because I know it will yell
I mean if it doesn’t work, and I know that it’s
backing up everything that’s important to me. Background API
activity is something that I consider
pretty fascinating. API, application
programming interface, if you’re not familiar with it, it’s essentially a set
of rules and protocols for machine-to-machine
communication. So it’s how one
piece of software interacts with another
piece of software. Well as we see some of these
so-called cloud-type services and all these other hosted
platforms out there, looking into the next bullet,
everything is a service. These services have to
interact with the software on a client system and they
do that through these APIs. I saw a figure that said
somewhere north of 51% of all Internet traffic is
currently non-human generated, and the vast overwhelming
majority of that is derived from or
directly the result of these API calls
that are happening. I’ve got a couple
examples of how that works coming up in a bit, too. I mentioned Everything
as a Service, this has become more of a
marketing buzzword I guess than anything else. You’ve got your
software as a service, your platform as a service,
infrastructure as a service, everything as a service. And those are all a result of the centralizing
of computing resources, whether that is something
like an Amazon EC2 or Rackspaces-type of solution. You’re able to off-site
a vast majority of your functionality, but
of course all these services are fundamentally relying upon
the network to communicate, to manage themselves,
and certainly at the end, to provide data or information
wherever it’s needed. Now despite all these brand-new,
fascinating, sexy things that you can use a network for, it’s still good for
the old standbys. Malware’s got to talk,
malware’s got to beacon out to its command-and-control
server. When the network is compromised, its compromised for a reason,
and it’s probably to watch or collect or extract
information from
that target network. And generally that’s happening over some kind of
a network link. You don’t really see too many
crazy Hollywood transfer stuff over the air through these
these exotic interfaces when the network is
there, and it works. And probably one of my favorites is, bad people still
talk about bad stuff, because hey man there’s
dumb bad guys out there, and the more of
them we can wrap up because they like to talk
about the things they do over email or instant message
or any of these protocols, the better. One case in point, case that
I have been able to support for a while now. We had someone who for
some reason archived about four to six years
of their instant messages on their own system. So we had 190,000-some
instant messages that we could go through. Now in that case of course
we were pulling that off of a disk, but it just shows that there’s really no
slowing down in the rate that bad guys are willing
to talk and and communicate, collaborate over
open network links. So there’s two main ways
to acquire network data. And the one that most people
think of right off the bat is live capture on the wire. This is going to be stored
usually in a pcap format if you’re familiar
with that term, the utility tcpdump is
what’s most often used to collect it. If you’re in a GUI
environment you can use what’s called Wireshark. Some of these utilities
may be familiar to you, if not, certainly things
that you can dive into, generally free utilities. But when capturing
data on the wire there are two primary modes
of operation I would call it. One is to collect headers only and one is to
collect full content. Now if there’s any
folks in the room that are law enforcement or have a law
enforcement background, the header capture is
analogous to a pen register or a trap-and-trace device. That’s the sample that you see in the little blue
box down below. And that is just going to be
the metadata about the packet. So if you look at this packet, I’m pretty sure you
can see my cursor, you’re going to see a source
IP address, a source port, a destination IP
address and port. And there’s a couple
other bits of information that are embedded in here
according to the IP protocol. And in this case
what you would find if you were to decode all that is this is a UDP port 53 packet. And that’s all that we
can tell at this point. Is there anybody that, you
should be familiar with most network communications,
you may be quick to think, well hey that’s DNS. And that’s not
really a bad guess. However, because we only have
the headers we don’t know. All we can truly say,
all we can observe, is a source IP and port,
a destination IP and port, and some other information, including the
length of the packet that was originally on the wire versus what was just
captured in this header. And a few other pieces of data. However, once we transition
into a full-content capture, you see that it adds on
the boldface fields here. This, in the law
enforcement analogy, is a Title III or a wiretap, so this is when you’re
getting the entire packet. And what we see here is by
parsing out this information we see that truly this is
consistent with a DNS query and this DNS query
is for the A record associated with the
hostname cnn.com. Is the full content in
this case more helpful, more extensive, oh absolutely,
there’s no doubt about that. However, I want to be really
careful to avoid the pitfall of saying that header data
alone is not worthwhile. Header data can be extremely
extremely valuable, especially when
it’s all you have. Aside from the procedural
difficulties sometimes of being allowed to
capture full content or put a wiretap in place, the amount of storage
required is huge, and there’s a lot of
other potential drawbacks. There’s potential for lost
data on full content capture. However, what I’ll say on
on being able to capture header data is a little bit
of real-world information, real-world case
study so to speak. I had a situation where
a large number of systems around the world were
backing up their data via FTP to one system. We identified that system and we were looking
at the headers. We had a pen register up
on that that FTP target. By looking just at the headers
we were able to map out the entire
architecture, worldwide, of where all these systems were and what their
backup schedule was, roughly how much data
each one was backing up, and through some other
analytic insight that we had we were able to characterize
what kind of activity each of those remote
systems around the world was responsible for within
this global architecture. It was very very successful
and there are some other big successes that we had
out of that pen register that was up, and we would
have never been able to get a Title III or a wiretap
approved in that case, so just having headers
was wildly successful. So even though full
content is great, it certainly does
have its drawbacks, and a lot of times headers
can get you what you need. So a second way to acquire
network-based evidence to support an
investigation has actually not often thought of as as
collecting network data. But we’re talking about
logs from services and infrastructure
items that assist with or facilitate network transfer. So for example if we’re
looking at proxy servers, a web proxy server
logs every URL that the systems behind
it are requesting. That’s a very valuable
piece of evidence, it can be, might
even say invaluable, depending on what the aims
of the investigation are. But that’s something that
should never be overlooked. Firewalls may in some
cases keep packet logs. Probably in most
cases I would say they would be logging
packets that are denied, which can be useful, but I’ve
even seen some organizations that logged every packet
that was successful. It got a little bit
excessive to say the least. But in the case that we
were working at the time it was all we had to go
on, and it was very helpful to look back into
their archival past. Flow records, if there’s
anyone in the the webcast today who’s familiar with
network architectures and things like that, setting up large networks
often involves logging what’s called traffic flow, and that’s going to just be a very very high-level
accounting of how
much data flows from point A to point
B or link A to link B. It’s something that most
network admins will use to determine, is my
network link saturated, is there a vast
migration of data between portions of my network. So it’s usually used for
an administrative purpose. However, it can be
very valuable to us if we’re looking at,
was there a large cross-flow of information
between the production net and the executive’s
subnet, for example. Or a large transfer of
data from a set of servers to an off-site link they
really shouldn’t be going to. That can certainly
be useful to us in trying to hone our
investigation down. Now what I love about
all these utilities is that they probably exist, so if you’re in an
incident response role, and you walk into a
victim organization, you can very reasonably
say hey everybody, can we have all your proxy logs? And they’re probably not
going to look at you too funny because that’s a very common
thing for them to have. I think that not
asking for these is doing a disservice
to the investigation to be completely honest. Just because they do exist, and tools are readily
available to analyze them. Now kind of jumping outside
the mold a little bit, or thinking out of
the box, rather, there’s a lot of new
technologies out there, and I see that there’s some
folks from Carbon Black in the room, I’m not
changing what I’m saying because they’re
here, I’ve said this a number of times before, I’m very impressed
with the product that they’ve put together. The analogy that they’ll use is, it is the security
camera for your network. Or it is the equivalent
to the airlines’ flight data recorder
for your network. And it’s almost an
agent-based setup where the endpoints
on the network will report in to
a central server all manner of
useful information, such as whenever certain
files were opened or executables are run or,
from a network perspective, whenever a network
socket is opened. So we’re almost getting
firewall-level granularity, but we’re getting that
from the endpoints. So when we see cross
flow of information between subnets that
doesn’t cross a firewall, that’s something that a
firewall log is going to miss, but a utility or
something along the lines of Carbon Black, it’s
collecting all this information in the background so when you
walk into an incident response if that data is
available it can provide a very very granular picture
of what was happening on the network side of
the house in the past, and you don’t really have
to think too hard about it. That’s a just a
piece of technology that I’ve been very
impressed to see grow, and I foresee a lot of other
interesting out-of-the-box type utilities that will
be able to help us as well. Now this is not a
technical thing to look at. We need to consider things
that are non-technical as well. I hope for your sanity
nobody has had to walk into a data center that
looked like this, but in order to
protect the guilty I will not ask anybody
if you’ve walked into a data center that
looks like this, or if you have a data
center that looks like this. But one thing that I
think is really important is sit down and talk
with the network admins. They can help explain what even the best-commented configuration
file can’t explain, which is why was
something done like this. Why was X done, why was
this setting put in place. Why are the logs stored
here in this partition versus this other system. Those can all be very
very useful to us as investigators
because we can capture the configuration files, we
can capture the log files, we can enter them to evidence. But then when you’re
sitting there looking at it you can be left with an
infinite number of possibilities on why something may have
been done the way it was. So let me go back to the
concept of augment, not replace. I put a picture of the
U2 up here not because they fly it out of the
base I used to work at in California but
because I think it’s a really cool airplane
and I think it’s an incredibly good analogy for our approach,
what our approach should be. The U2 is a jet that you
can essentially build a new configuration depending on the commander’s requirements. So for example, a
commander comes in and says I need this loiter time, it’s
a certain number of hours, and I need it to be
looking with these sensors, and I need this other
capability and I need datalink in this many minutes or hours
or whatever the case may be. And what the people
are going to do, the maintainers who are
responsible for this airframe, they’re going to pretty
much build a new jet. It’s kind of like watching
a Lego set put together, and I don’t make
that analogy lightly, it’s really impressive to see. They will put together all of
these different capabilities and build exactly what is needed to address the commander’s
requirements in the field. In that same vein I think
that we need to continue corroborating our findings. When we look at a dead-box
analysis, for example, we can refine our understanding
of what was happening on that system at a certain
time by incorporating the network perspective
into our analysis. So a couple of examples of this. PsExec is a utility
that Windows can use to remotely launch a
command on another system. So if you see evidence
of PsExec being used on a Windows system
maybe a victim system that you know or
suspect was compromised, that can be very
useful observation
that’s a good finding. However, if we are then
able to use the network side and use a network
source, network data
source of some kind, we could potentially determine
was that PsExec successful, did it return data, did
it return a little bit or a lot of data, is
it consistent with
an error message, or is it consistent with some
other kind of command output. It can be very
very useful there. Let’s say that we observed
that a sensitive file was accessed, and the
proverbial example is hey, what if what if we
see that the CAD file for the nuclear power
plant was accessed. The A time on the
file system indicates that it was that it was
updated that that A time was updated while we know
an attacker was on a box. Well that’s a useful
finding again. However, was it was
extracted from the network, exfiltrated is probably
the right word, I’ll change that before
I PDF these documents. But if for example that CAD
file was a three-meg file, and then as soon as
it was accessed we see that there was a
three-meg data transfer from the compromised system
to a foreign IP address of some kind, that’s going to
create a much bigger finding than just, hey this
file was accessed. That I think is something
that’s extremely useful. A case that I did
support as well, this was the situation where
we saw a large database that was accessed and we saw
the data leave the building. And we were able to say
hey, the amount of data that left the building
is consistent with the attacker’s preferred
compression utility after it compressed this
set of database files. And it was absolutely,
incontrovertible
proof at that point that we said there’s
there’s a very little chance this is anything other
than your whole database walking out the door. Another one that’s obviously
getting a lot of play is phishing. If we’re able to see that a
phishing link was clicked, maybe we see that that URL
show up in someone’s email, and then we see that
the URL also shows up in the browser history. Again, OK, person clicked or
otherwise opened that URL. However, what happened next? Was there a download attempt? Did it try to get an EXE file? Was it successful at
getting that EXE file? If it was successful
that would most likely be something we’d see in the
file system forensics. But if it was not we may see
that only in the proxy log, did the proxy block
that for some reason, did an antivirus solution,
inline antivirus, did that potentially
flag and deny that? Those are all
network-based indicators that would complete
our understanding. And then, even more so on that, do we see any kind of a
beaconing and after that? What if every 60 or 120 seconds
after that link was clicked, we see a network connection
attempt leaving the network going to a suspected
or confirmed malware
command-and-control server. Well that’s going to be a
very very compelling picture that we can paint. I’ll mention this one briefly because I’ve got a
real-world example that we’re going to step
through in just a minute here. But search-history entry. When you go to Google and a lot of the other
search engines now, when you start
typing in your phrase you’re going to going to see
the search results appear in near-real-time. And the keystroke timing can be a very very telling
indicator on whether or not there is a human at the keyboard or whether it may be
a piece of malware. And I’m going to step into
that example right now. All right, so this is
the venerable search bar. This example comes out
of a version of Firefox running within Linux but it
can be pretty much any browser on any platform. Now let’s say we’ve
got an individual who’s starting to type
in this search box, and they type how to. You can see a whole lot
of the possibilities that Google thinks they
may be looking for. Maybe they really are looking
for how to cook spaghetti, or how to delete their
Facebook account. But in this case our person
that we’re looking at is searching for
how to dump a body. Now doesn’t really matter what
my investigation is about, I’m going to probably
call this interesting. May not call this a finding
if I were to see it, but I would definitely call it something worth looking
a little further into. But before this
person hits Enter, they remember, hey
wait, when I logged on I had this banner that said
that those network guys and girls can look at
everything I do on my computer, so I’m not going to I’m not
going to search for that. I’ve changed my mind. Now if you’ve gone through one
of the new forensic courses or you’re familiar
with browser forensics you know that if the user
doesn’t hit enter here it’s not going to be entered
into the search history, it’s not going to be entered
into the browser history. There’s a lot of places
where this will not show up. You may get lucky and find
this somewhere in for example, memory or some other
type of structure. But those are definitely
going to be long shots. And they’re not going to
be as much of a guarantee as it would be if it were
if the user hit Enter at this point. So at this point I user
realizes like, I said, this is a bad idea, we’re
not going to do this. I’m going to go ahead
and backspace this out, and instead I’m going to
type how to bake a cake. Now I don’t know if anybody
here has played Portal, but if you are
familiar with the meme you know that the cake is a lie. There’s never a cake. So we know that this user
probably wasn’t looking for how to bake a cake. But when the user hits enter
the browser-based forensics is going to show us that this
is all they searched for. But if we go into
the network side, now we can see the
complete picture. And what you’ll see here,
after that scans down, this is output from Wireshark. If you’re not familiar with it, it’s a packet capture
and analysis utility. And this is showing every
single URL, API URL, that the Firefox web
browser was making as the user was typing. Now if you look over to
the side of the red box you’ll see that here’s the
first query as it was built up, a character so at a time, and here’s where
the user backspaced, and then here’s where the user
typed in their second query. Now this to me is pretty
clearly indicative of, it looks like a
keystroke logger. It looks like any kind of
a key logger out there, which is very very valuable
that it’s happening a key at a time
over the network. As long as you know enough to
research what the structure of these URLs are that
the Firefox browser is sending in the background, you can derive a tremendous
amount of investigative value from this kind of information. Now in this slide I’m showing
the example within Wireshark. There’s no reason that it
couldn’t also be something from this kind of data would
come from a proxy server. But you can see here pretty
clearly the URLs are telling you a very compelling story. I mentioned on the previous
slide keystroke timing. Had a situation where
we were assisting with a company’s investigation
of one of their employees. The employee was found to have
been conducting inappropriate web activity and the first
response was hey, it wasn’t me. It was a virus. I didn’t I didn’t search
for that bad stuff. The virus did it. Anybody ever hear
that response before? It’s probably one of the most
common excuses that I hear. However, the user did
have some kind of malware on their system. And I don’t remember if
it was a virus proper, but it was some
kind of nasty evil. So we were working on the
chicken-and-egg problem, which came first, the
unauthorized activity or the malware that
was on the system. And one of the
things that we pulled during this investigation
was the proxy logs. Now the proxy logs showed
the individual using, I’m sorry showed that
the individual’s system, was responsible for activity
that was using Google to search for terms that
were consistent with the unauthorized
activity on their box. But we had a sufficient
enough set of data that we ran some statistical
analysis against it. We found that the
keystrokes were entered exactly one second
apart, plus or minus, I think the standard deviation
was 0.01 or 0.02 seconds. It was ridiculously small. So at that point we took
a step back and we said, we know what, there’s
there’s a lot of good typists out in the world there. I am pretty sure that there’s
no typist that is good enough to type with that
degree of regularity. So I think we can
characterize that activity as machine-originated. And I don’t remember
what the outcome of that particular
investigation was, but we were able to rule that
out based on the network data that we had available to us. OK one other example that
I like to talk about. Everyone should, if you
haven’t heard Chad Tilbury speak about some of his
geolocation artifacts. Really really really impressive. And he talks about a
lot of the artifacts that might be left
within browser history. Well, this is a case where
this probably wouldn’t be left in the browser history
again unless you get lucky with memory or
something like that. So when I go to on vacation
I make SANS slides, that’s what I do. So you can you can
pity me if you’d like, I do enjoy it. So what I did is I set this up, and I went to Google Maps, and I said show me where I
am, the blue dot feature. And it said do you want
to let this website see your current
location, and I said yes. And what was happening
in the background is this API request was made. You can see a
couple of the things to jump out at you right here. We can see that
there’s something that is called a MAC address and is consistent
with a MAC address. I’ve changed all these. You can see that
there’s some SSIDs OK, so we’ve got some potential
network identifiers. We’ve got that repeated
a couple of times. And there’s this
thing called SS, which I later confirmed
was signal strength. So what’s happened here is
we’ve got this API request going out in the background, user doesn’t realize
that that happens, and Google comes back and
responds with something nice and convenient like this. Here’s your lat, your long,
and here’s how certain I am that this is where you are. Now if you’re frantically
typing this into Google Maps or something don’t worry, I’ll
show you this map in a second but what I want to say first
is when I did this test I was tethered on an iPad, Verizon iPad. Was not using geolocation
services on the iPad at all. And then I was also VPNed
through the iPad connection through the LTE connection
back to my house here in Delaware. So when it comes to location there’s a lot of things in there that could reasonably
obfuscate my physical location when it comes to a lot of
these network detections. However, sure enough, Google
comes back with an answer that puts me within about
20 or 30 feet of where I was on vacation with my family
in Marathon Key Florida. Very very impressed. When I started this example
I didn’t really know what the response would
be what, will come back, and I was like wow. I now see because it’s
using SSID lookups why it is as accurate as it is, but a lot of those location
obfuscation artifacts or location obfuscation methods are rendered useless
through this particular API. Now if you’re in an enterprise
environment for example, and you were looking
at these in real time, privacy implications aside
this would certainly provide a very very complete and
difficult-to-refute position of where someone was
at any given time. If you’re a law enforcement
person then you can understand this could be of use
in stalking cases, potential restraining order
cases, things like that. You can bust somebody’s
alibi pretty clear open if you had this
evidence available, and it was inconsistent with
what they said they were at a given time. So stepping briefly out of
the the demonstrated practical and a little bit more into
the kind of future-looking, theoretical stuff. I don’t use Siri, I just got
it on my iPad with the update, I don’t have the
latest greatest phone. But what I find interesting is
their model of how it works. Siri works by sending a
highly compressed waveform of your voice from your
device to an Apple server. The Apple server then
does it’s magic and converts it from the
waveform into text. And then the text is returned similar to what you saw
before, a JSON object. And it tells you a
per-word confidence score, and a timestamp. Obviously timestamps or
something very important in our profession, and having
that per-word confidence could be very useful
in determining is this really what
the person said. So even if you didn’t have
that that waveform available to you to to hear and
make a human judgment on, the text itself can be very
easily machine parsable, machine readable. That could be something
very valuable. Now do I really think that a
criminal is going to use Siri to look for the nearest
Lowe’s or Home Depot to go buy lye and a shovel so
that they could hide a body? Hey, I want to go back to
stupid criminals exist, they’re a thing, and I
would not be surprised if I were to hear
that there was a case that something
like this was used in refuting or confirming
somebody’s story. These are a couple
more examples. I mentioned Dropbox
before, but what I want to specifically call out
here is the fact that it is all API-based. It is all HTTPS,
so it is proxyable, a lot of these new services,
they want to be functional behind corporate firewalls
and proxy-type architectures, so they worked through
HTTP instead of requiring some kind of proprietary
or non-standard protocol. So this is something you can derive a lot
of value from as well. We’ll talk about HTTP and
some of the hurdles that SSL can can put into our
mix in a few minutes, but suffice to say that
the Dropbox is something that you can look at via
the network quite easily because it is HTTP-based. Facebook search,
I’m glad Rob’s here because I’ve used this
joke once or twice before so now I can actually
use it when he’s here, but if I want to stalk Rob
Lee and I’m in Facebook, I can do that just
in the same way that I did in the browser. You type in someone’s name and it’s going to do
the real-time search. And that again is sending your
information, a key at a time. I would even add
another one to this that I didn’t put on the slide,
but when you start typing a status update in
Facebook it is sending some of that
information in real time over to Facebook servers
so that they can say did this person take a link,
if they did type a link I’m going to show
a link preview. If they did type a
another person’s name, I want to look up that name. So you can actually
see status updates passed over the network
as they’re being built, which is the same kind
of benefit as we saw about search technology. Another one that I think
is utterly fascinating and I won’t lie, it’s so far
beyond me on a technical level I can just look at it with awe. Don’t worry about this
equation that I put up there, that’s just the first
thing that came up when I Googled
scary math equation because I think it applies here. But some researchers,
academic researchers, have identified a way of
looking at the file size of Google Maps tiles that
are returned over SSL and doing this
incredible statistical
analysis against them, identify with a high
degree of certainty what location on
the globe as well as what zoom level the user was
looking at in Google Maps at that particular time. That’s all without
worrying about SSL. Is that something that is
going to be ready for court anytime soon? Probably not, however it
is something that indicates the direction that we’re going, these fascinating capabilities
that are out there. And I’m excited to see where
that kind of research goes because there’s that
and a lot of others in the same vein that
are going to become more and more popular and
more more commonly seen in the research community. So as with all this
wonderful stuff, we’ve got to talk
about challenges, what’s going to make it
difficult for us to do well in this space. Well primarily I’m gonna
put the law up there. There are a lot of
legal requirements that come with doing network
capture and network data. I mentioned a few that
law enforcement folks need to be concerned with
in terms of a pen register versus a Title III. There are a lot for all
of us to be concerned with in a corporate or government
or law enforcement capacity. I’m not a lawyer, at all, but I do work with
a lot of lawyers and I respect their opinions when it comes to
this kind of thing. Before you get too
creative in these make sure that you’re
engaging your inside counsel or engaging some kind
of legal representation so you don’t accidentally
overstep the bounds and get into a mess
with the Wiretap Act. If you remember when Google was
publicly really smacked down by Congress and
privacy advocates because their Street
View cars were collecting all this
wireless information. I’m not making a judgment or a
comment on what I think about that they were doing
that, but what I am saying is the legal hurdles that
you’ve got to go through, the legal requirements,
they’re significant, they’re very convoluted, they’re very difficult
to understand. And make sure that
you’re covering your butt doing these things. As long as you’ve got
your legal folks involved you’ll save yourself
a potential headache, quite large headache,
down the line. Talking about SSL and VPNs. Yes, these are a problem. They are not the
end-all be-all problem, however I do not think
that they’re going to end network forensics before
it starts, for example. However, there are a lot
of capabilities out there, a lot of technologies
like the Bluecoat proxies, and a few of the other SSL kind
of man-in-the-middle tools. As long as you can
control the endpoints, as is very common in a
corporate environment, if you control the the
SSL certificate stores of the endpoints you
can get in the middle of your client’s SSL,
break that back down to its plain text
equivalent, run your analysis while it’s plain text,
and then re-encrypt that on its way to the destination. There’s a lot of
precedent out there on how this is deployed
in a corporate environment and how this is
used efficiently. The other thing that
I would say on this is the SSL environment
itself is based on trust. And the trust is starting
to show signs of age, let’s just say. We’ve seen a number
of certificate
authorities compromised in the last 12 to 18 months. I think it was for
or so at last count. As long as, well there’s
just the Adobe thing today. Today or yesterday
that came out. So there’s a lot of trust
that people place in SSL as a whole, and I
foresee the day when that is going to have to change
because we’re in a situation where SSL is not as
trustworthy as it claims to be in my opinion. VPN similarly, there are
ways of handling these when it comes to
network forensics. Even if all we’re getting
is header data, for example. Talk about the value
that we can drive there. That is a future, a consideration that
we can we can make when we’re doing our analysis. Hey we’re looking at
VPN, OK well what’s left, can we look at just
the just the headers, and if so is there
any use for it. If yes great, let’s
go that direction. If not OK, let’s
not waste our time. Just a question on
SSL man-in-the-middle, I want to address it real
quick while I’m on this bullet. The question is whether
there’s an answer to how man-in-the-middle will not work when Diffie-Hellman
cipher is used. I don’t have a specific
answer for that. I haven’t explored all the
intricacies of a lot of these different SSL protocols, but again I would definitely
say that at a very minimum we will always be able
to see the volume of data that is transferred,
within reason. Depending on if there were
particular countermeasures employed by an attacker
or by a malicious actor. There’s some value
to be gained there, and if the value stops at saying
yes this type of encryption was there and we were
unable to go any further, well at least we can make
that observation and press on. It will continue to be
an arms race I think between the malicious
actors and us, because there is always
a motivation to do bad and in turn we have a
motivation to do good on the other side of the coin. So it’s it’s not a specific
answer to your question but I foresee a future
where even more aggressive and more efficient cryptography
algorithms and methodologies are going to make
this more difficult. We’re just going to have to
continue to identify the best we can do within the
confines of the problem. Full packet captures
can take a lot of space. Storage is pretty cheap,
enterprise storage however is not cheap. If you’re talking about an
enterprise storage solution to capture and store
every packet coming across an organization’s
perimeter you’re going to real quickly see that
you know that OC3 connection into the organization is not
enough to store very much data. It’s only going to last
maybe a couple of days or a couple of weeks. What we see in a
lot of organizations is a rolling capture buffer. Their full capture system
will keep 14 days worth of network traffic
on this segment. Maybe 30 days on a
more sensitive segment. And after that it’s
going to roll over. So that’s one way
of mitigating it. We’ll also see some
organizations say after our full capture
expires we’re going to carve that back
to its header data and we’re going to
keep the header data for a longer period of time. Six months or a year
or something like that. It’s all a matter of
how much you can afford, in a lot of these
situations that’s the case. But that’s one way that we
can address that challenge. And a non-technical, two
non-technical challenges that we have to face, I
mentioned the arms race, I think that’s an ideal
analogy for the situation that we’re up against. Our adversaries are
dedicated, they’re creative, and they’re not encumbered
by these pesky laws and regulations and processes. When they can adapt as
quickly as they need to continue meeting their goals, then we’re going to need to
adapt, we’re going to continue, we’re going to be on this
merry-go-round of adaptation and improving our capabilities. Hey, that’s job
security in one sense, but at the other end we to
make sure that our processes are not static, that
our processes themselves are designed to incorporate
a more dynamic flow, a dynamic development
of capabilities. And finally another
kind of non-technical is that a network
data is ephemeral. If you don’t capture
that packet it is gone, it is not coming back. Maybe you can carve that
somewhere out of slack or somewhere out of
memory or some place, but on a large-scale,
repeatable level, that packet is is a
one-time opportunity. And we need to be very
very aware of that, both because our
capture solution or our analysis
process needs to, we need to understand that
the limitations they have. Hey, we’re only going to get
this many gigabits per second, this many megabits per
second captured or analyzed. But at the same time we just
need to be able to explain what those limitations are, and have people start
to to accept that. We’re going to need a better, a more accommodating kind
of managerial oversight, where that kind
of situation is OK and it was built
into the process. So what do I see
him on the horizon in the next couple of years. First and foremost on our
APT-grade adversaries, they’re the new normal. They are growing more common. I’m pretty sure we
would never have to count higher than five if
we were just counting days between huge press releases
on the latest, greatest, nation-state something or other. It’s the new normal, and what
I see being really really important for people to notice
and and be be very clear on is this technology
will trickle down. Just as we see high-grade
technology used in government projects and
really elaborate use cases and then eventually trickle
down to our toasters and our fridges, we’re seeing these
nation-state capabilities trickle into the common
criminal element. We’re seeing the the
criminal underground, the carder-type… The carder-type
forums and adversaries on the carder forums. We’re seeing then
we use capabilities that just a few years ago
were through-the-roof crazy, they would have absolutely
floored the media if they would have seen them, and now we’re seeing them
on a weekly basis, maybe. And hey, you know most
malware still relies on network transport
of some kind. I read a situation where a
bank was using RSA key fobs or something like RSA, I’m
not saying it was that brand, but they’re using a kind
of one-time password or key-based authentication,
which is a good thing right. The bad guys are going to
be at a disadvantage there. So what the bad guys
did is they said anytime their malware
saw a password field, and the value entering
the password field ended in six digits, it actually
sounded a real-time alarm in, I don’t know some warehouse in whatever country
they were in, and at that point they
knew they had 60 seconds in order to use that to
gain access to the account. That to me was a
huge development because the logistics involved with a real-time alerting system
are not not trivial at all. And I think seeing that
these criminal elements are becoming more eager to
use this high-grade technology for a more low-grade end is a
really important distinction to make and I think if we’re
not ready to to deal with 10 Stuxnet or Flame-grade
malware outbreaks a year then we’re going to
be in a disadvantage. Also of course
at-rest investigations
will never go away. You will never ever ever hear
me say anything like that. We just need to incorporate
more dynamic evidence. If you’re in an organization
that is not yet handling memory forensics for example
then I cannot urge you strongly enough that
this needs to be the way that you go, yesterday. Because we’re going to
see more dynamic evidence involved in our investigations
as time goes on. And something else is just
that we are shifting back to centralized computing
in a lot of ways. We’re seeing Amazon EC2,
we’re seeing a lot of these various scalable
solutions out there. And that means that the
network is going to become a core source of
evidence for us. I don’t know if anyone
has had an opportunity to read a book by an
author named Cliff Stoll, it’s called the Cuckoo’s Egg. If you have not please
please please please, you should absolutely go get it, you can get a used one on
Amazon I think for four bucks. It details an intrusion into
a mainframe architecture, it was way back in the
day, I want to say it was like mid to late 80s, it’s
been a while since I’ve read it but it details the way
the author, Cliff Stoll, investigated the incident before network
forensics was a thing. I mean this was this the
early days in the early days. And we saw him trace
through this situation using an investigative,
scientific mindset, and was able to
get to a conclusion with pretty good
evidence to support it. Between then and now computer
forensics has evolved into this very formal, very
well established capability, and as we move back to a
mainframe-type mentality where our computers
are somewhere else, our hard-core computers
are somewhere else, and we just have a front-end
to them in front of us, we need to handle
that computing mode with all of the knowledge
that we’ve gained since the Cuckoo’s Egg and
apply it to what we now know and how we now know we must
accomplish our investigations. So in summary I think
that network evidence is absolutely critical to
completing our understanding and establishing a comprehensive
picture of what happened during an incident
or during anything that we might be investigating. And I think that it is going
to become even more important because we do see so
many of these functions moving onto the network. We talked about a lot of
these these elaborate, and even some other
not so elaborate network-based evidence
situations that we
may come across. And I think those are going to
become more and more common. If we incorporate
our dynamic data integrate generic data
acquisition and just the idea of dynamic data into
our workflow and into
our processes now we’ll be much better
able to accommodate whatever the next dynamic
evidence source is going to be. And finally we’ve got to
stay agile and stay advanced. If we don’t keep learning
and keep improving ourselves I think that we’ll be at
a disadvantage as well. One way for that, I
think that the chance kind of funneled through
the forensic curriculum is an ideal way to do that. Obviously you could you
can say that I’m biased, but I think I’m also right. But moving from the
Forensics 406 to 508 and then branching
out into some of our various other courses,
including network forensics, I think that’s a really good way to build a strong foundation,
and then build on that as you move into some of the
more specialized functionality. And then looking forward, here are the some of
the upcoming events. And depending on which course
you may be interested in there’s there’s a lot of
different options here. I do want to draw
attention to the fact that if you’re going to CDI and
you register for your course before middle of
November you get 10% off which is a very worthwhile
thing to attend. I had a really good
time there last year. But also for my own stuff,
if network forensics is something that
you’re interested in, if primer piqued your
interest in a way that you would like
to pursue further, I will be teaching a five-day
course in Quantico Virginia at SANS Community in October,
the dates are up here, and it’s definitely something
that I would be interested in seeing you out and hopefully
getting you you trained up and ready to attack
this kind of evidence as as it crosses your desk. So I’ll end it with this, just by saying we
will never be bored. We will always have new
challenges in front of us. And hope to hear great
things from folks and hear about the ways they
are using this information in their daily lives, and
with that I’ll open it up to any questions you might have and do my best to answer those. If you’ve got to
leave, I understand,
appreciate your time. And I appreciate
that you stick around running a couple minutes over. All right. – Thank you, Phil. Now we’ll start the Q&A section. If anybody has questions
feel free to type them to moderators in
the chat window. Have you found DNS
query logs useful? – Yes, absolutely. Certainly when we see
a lot of the malware using its DNS domain-name
generation algorithms, for example like the
Conficker-type thing where it would generate
new domains every day. Just being able to observe those
and determine which clients are making queries
and which clients are, what are the results,
can we subvert them, can we become authoritative
for that DNS name, that hostname, if
it makes sense? Those query logs are
invaluable in that regard. Now whether you’re, again
you can look at those through the query
logs on the DNS server or if that’s not
reasonable you could always look at the the network traffic
and do that with tcpdump for example, and identify
what queries are being made and then certainly what
the response is coming back might be as well. So, absolutely. – So thank you so much Philip
for your great presentation and for bringing this content
to the SANS community. To our audience, we really
appreciate you listening in. For a schedule of all upcoming
and archived SANS webcasts, visits sans.org/webcast. Until next time, take care
and we hope to have you back again for the
next SANS webcast. – I see that Tom
Ferrara has a hand up. – Sorry. – I didn’t know if I was
a question or a misclick. To answer, I just saw
Robin posted a response on the DNS thing. Yes, you’re correct,
generally you can only get the responses
through packets. I’ve seen some DNS servers,
I want to say DJB DNS has some kind of
crazy logging options that you can log some
responses in there as well, but I prefer packets because
it’s a bit more useful. More useful, least in my mind. – I’d like to
thank you all again and we will see you at
the next SANS webcast. – Thanks everybody.

One comment

  • I am currently studying digital forensics and haven't quite gotten around to studying the networking side yet. How long do you think it would take me to learn the networking forensic side of things and tools like wire shark? Where would you recommend that I get started?

Leave a Reply

Your email address will not be published. Required fields are marked *