BitCurator Consortium Webinar – bulk_extractor Beyond the Basics

BitCurator Consortium Webinar – bulk_extractor Beyond the Basics


well as Sam mentioned this is advanced
topics women are by the bit curator consortium we’re gonna be talking a
little bit more about how some of the advanced features that we’re using for
bulk extractor and a little bit more in depth about what some of those advanced
features are and how you can potentially use them to deal with your foiling
digital collections hello I’m Michael Olson and I’m the service manager for
digital forensics lab and Sandy do you want to just briefly introduce yourself
sure hi good morning everyone my name’s sandy Ortiz and I’m the digital
forensics lab assistant here at Stanford University great I’m gonna spend just a
few minutes kind of the beginning of the presentation providing a bit of context
and then I’m gonna turn it over to Sandy who’s gonna go into a little bit more
detail about some of the advanced features though so thanks for joining so
some of the things that we’re gonna cover in today’s webinar I’m gonna very
briefly kind of go over a brief overview of bulk extractor I’m sure most folks
are at least personally here with some of some of the features of what bulk
extractors and intended to do then I’m gonna provide a little bit of context
for however why were interested in using it here at Stanford I’m gonna talk a
little bit more detail about the software itself and then I’m gonna
actually turn it over to Sandy who’s going to address some of the advanced
features actually define what they are she’s going to talk a little bit about
the requirements to run some configuration information and data some
sample runs and the results of those sample runs and then at the very end
we’re gonna open it up for discussion and questions so what is bulk extractor
bulk extractor is a powerful analysis software tool that allows us to identify
potentially sensitive information such as social security numbers credit card
data and other types of potentially sensitive inform digital collections
it’s important to note that it was developed for forensic investigations
and this is actually really important for digital archivists and librarians to
to to understand when they start to look at the tool and particularly when we
start to talk a little bit more about the advanced features and how they’re
actually how they’re actually built out in the application and more importantly
how they’re actually used together bulk extractor it scans a disk image a file
or directory of files to extract information from said data and one of
the useful outcomes of this as it generates histograms of different
features that would be useful and doing analysis and processing of more digital
collection now the larger question that I think is important because it’s going
to really impact how how we’ve approached using it here at Stanford is
kind of our institutional context it’s important to note that we’re not
actually using bulk extractor in production yet we’re actively testing it
and trying to figure out exactly how we want to apply the app the bulk extractor
application in our production workflows we’re really interested in having bulk
extractor automatically scan all of our collections to create histograms that
can be used by a digital archivist to evaluate whether board digital archive
or archive has personal identifying information or sensitive data in our
collections and finally we’re planning on using bulk extractor to provide more
in-depth analysis of types of data that are contained in more digital
collections so for example a collection that contains a large volume of work
processed or textual files after running it through bulk extractor may provide a
digital archivist with further ideas on how they might want to bulk extract our
application to look for collection specific features and this applies to
not only collections of textual data but any sort of type of morin digital
collection there’s a fair bit of variation in the different mind types
filesystems file structures document types that one would find and there’s
based on that data one with essentially do some tuning to bulk extractor to
provide additional data about what’s in the collection and how you might process
it part of the context for how we’re really that’s that sort of focused how
we’re approaching bulk extractor here at Stanford is we become very sensitive to
security concerns so what you’re looking at here is an example of the data risk
classifications and what we’re particularly interested in using bulk
extractor for is defining as you can see there the high risk bucket of data and
that’s things like health insurance policy numbers social security numbers
financial data anything that could potentially if it got out in the wild of
provide damage the reputation of the University and what’s interesting is the
these different buckets of data actually map to something like this and this is
kind of a list of approved services and so this these are things that
applications and systems that one that are approved to work with these
different types of data so for example if you look at if you look at the first
of the very first bullet here we could essentially put high-risk data in any
sort of zoom conference that we’re doing at Stanford not that you would
necessarily want to do that but that’s just an example whereas for example if
you look at number three it’s it’s our office 365 calendar we’re not supposed
to put our social security number in our calendar that’s probably not the best
example but if you were to scroll down further you would actually see lists of
different sorts of file storage or systems that are either approved or not
approved for handling that data so the point of showing you this is that
this is this is kind of the institutional context of how we want to
use bulk extractor and some of the advanced features to kind of define what
falls into particular buckets I’m now gonna talk a little bit more about the
bulk extractor software one thing that may not be initially apparent to users
in the big curator environment is that books the extractor is an independent
piece of software that is still under development in other words it’s an
earlier version of big creator is likely to implement an earlier version of bulk
extractor the current the curator release of one point eight point one six
uses version one point six point zero dev of bulk extractor I’m not going to
go into all the different release notes for this latest version but I think it’s
important to leave you with the fact that there’s incremental improvements to
how this particular scanners work and improvements to the buck extractor
viewer if you’re interested in learning more about the latest release of bulk
extractor we’ve got a couple pages of references at the end of this
presentation and one of them is to the github release notes for one point six
and it’s interesting also to look at there’s a little bit of a an inkling of
a road map for what future improvements or and/or work is as slated to be done
on the bulk extractor software in the future I just wanted to just familiarize folks
with the this is the bulk of the book extractor your interface you can
actually do everything that you want to do from the command line but I think
it’s important to just point this out as a visual reference for what some of the
different how the application actually works you can see at the top is where
you would specify the target for your data set whether that’s a disk image or
a directory of files and then there’s some general options box just below that
where you would essentially indicate to use like stop lists or banner files or
lurk lists and just underneath that is some tuning parameters if you wanted to
actually tune how the application was running it scans and off to the right
you’ll actually see here is your list of scanners quite an extensive list of
different types of scanners and Sandy’s going to talk a little bit more about
that forward so just before I hand it over to Sandy I
want to kind of talk a little bit about how we actually came up with a list of
the features that we were interested in showing the Pocky but in this webinar as
I mentioned before we’re not currently using in production but we didn’t start
off that way we actually started some analysis by actually throwing real-world
collections using bulk extractor as we were going down down that path we
actually had a fair bit of a bit of a learning curve
particularly when we realized that necessarily working with collections
that were known datasets created some issues so in other words we could run it
against a collection and come out like is that really the results that is that
really what’s in this so there there actually is some value when you’re
starting to use the advanced feature or we’re starting out using datasets that
are more round it was one of her findings that we hope is actually useful
for folks that are thinking of using some of these advanced features going
forward another important note is that some of
our collections are actually quite large so there are definite implications when
you’re figuring out how you want to actually apply advanced features
collections even though it’s designed to be very efficient application how you’re
how you select certain advanced features and how they work in combination is not
a trivial does not take a trivial amount of processing time it’s actually can be
quite quite extensive on the machine that you’re that you’re running it on
and Sandy’s going to talk a little bit more about that in the further slides
and finally the context for this is I thought as we’ve sort of been preparing
for this webinar and exploring how we wanted to use advanced features here at
Stanford I’ve kind of felt like the Cheshire Cat sort of pointing sandy
towards the a set of the doors or interesting paths
for her to explore and thanks Andi for being being so accommodating as we’ve
been going forward to tried new to new and different things different features
and that sort of thing it’s been a definite and exciting learning explain
so sandy would you mind taking over from here okay great well good morning
everyone I want to thank Michael and Sam for inviting me to be part of this
project and I’m gonna move pretty quickly here through these slides and
I’ve got a lot of information to talk about and I’m gonna try not to get too
technical but you know this is my first pass using bulk extractor so a lot of
this was exploratory for me and you know what I hope to do today is to is to give
you some definitions of some of the general options that are available in
that screenshot that Michael showed earlier because one of the first things
I had to do in approaching this tool was to say well you know what is a stop list
what is a word list what is an alert list what is this find regex text and
how do they all work together how are they different and then you know then
you have to get to the piece of configuring those to make them run and
give you relevant data so I’m going to quickly run through some definitions of
that and then I want to define some of the requirements because there’s
technical requirements that you need to have depending on which general option
you’re selecting to run and then I’ll take you through a sample run
configuration I actually did run through three different runs with several
different data sets and I just picked one out for the sake of brevity to show
you some screenshots and explain what the results were so that we can have a
little bit of discussion about that so with that the definitions I just took
right out of the bulk extractor manual and this was just for my references I
was starting with the tool you know and essentially the stop list is really kind
of a white list you know within your institutional context if you’ve
determined that there’s data that doesn’t need to be processed you can
kind of create a stop list to tell your scanners to ignore that data so it’s
really helpful also to use that in combination with an alert list if you
have a list of terms that you want to be able to treat in a very customized way
or take further analysis of then an alert list functions as kind of a read
list and then you’ll see right in the middle there the word list is kind of
something used for password cracking and and again this goes to the original
intent of you know the forensics applications that that this tool was
designed to use generally in a law enforcement context so they might be you
know interested in getting into somebody’s hard drive or phone and
that’s where having the tool scan and create a word list of every known word
on the device would be helpful to feed into a password cracking program and
then finally the one that I focused the most on was the find reg X text file and
that’s one of the general options in that bulk extractors general options
list and what I did is I took a custom lexicon file from the epad project here
at Stanford and I reformatted it and I fed it into that that particular scanner
engine and let it work with the other scanners and
that had some interesting results so it was quite a bit of work to actually put
all those pieces together but I eventually did get it and get some
results so you know there’s some specific requirements that you need in
order to be able to run these general options with the scanners and here on
the left you can see the configuration for bulk extractor what I did is I had
all the default bulk extractor scanners selected and then under the general
option I selected the option to use the find regular expression text file or you
can choose the alert list there in that section and what’s significant to
understand about that is that you need to have that file properly formatted in
a way that bulk extractor can use it so you can use I used the same lexicon file
for both the regex text file and alert list scan that I tested and one of the
other things that I discovered is for the processing bulk extractor could only
process the EO 1 AFF or the raw image formats and now again I’m new but that’s
what I kind of determined from my research and using the tool because my
initial approaches to the tool was we’re using the ad 1 format from which is a
custom format from ftk imager and so that threw some roadblocks for me and I
haven’t really found some sample images online to to use for that and and that
sample image reference is actually in the references section then on the
custom lexicon file the way that I had to format it is I had to put one term
per line and I had to make sure that each line ended in a new line and when I
ran it I didn’t make any customizations
regarding casein sensitivity I just used all lowercase I believe where most of
the terms in the file but that is something that needs to be paid
attention to is that the tool is case sensitive so if you have social security
in only lowercase letters then it’s only going to identify those terms that match
that case which you know is probably not something you want to do you probably
want to account for all of that case sensitivity when you’re scanning a drive
so here’s the hardware configuration I did run this on a quad-core laptop and I
had 16 gigs of ram winning running Windows 7 64 bit and I actually ran it
within a virtual guest virtual machine and I think this is significant because
the configuration of my a virtual machine
although the hard drive was 32 gigabytes which was adequate to for some of the
collections that I was processing it was really the the memory space and the
guest swap file space I think that’s probably really important if I were to
try and scan a larger size collection the image file that I was using was only
I think 500 over 500 megabytes but some of those image files can get fairly
large and that’s where paying attention to the size of your memory space and
your swap file size is going to matter because bulk extractor is designed to be
a multi-threaded application and it will maximize itself with the number of cores
and the number of threads that are available on those cores and it will use
all available memory and swap file space so you know configuring the hardware to
kind of maximize the efficiency of the program is a real important
consideration and I was writing this data to an external USB 3 hard drive
that was encrypted with veracrypt so you know there there were a lot of moving
Hardware parts here and I don’t want to get off in to
the rest of that but those were all of the considerations that I had to be
aware of as I was toying and monkeying with well how big is the image that I’m
using well how much Hardware am i throwing at
it you know how much throughput am I going to get with USB 3 where is it
going to hang up so these were a lot of the questions that that were running
through my mind as I was I was approaching this so here you can see the
the sample command line structures that I that I commands that I input and this
is all specific to my particular configuration because you can see here
the output destination path says media veracrypt and that’s my local virtual
machine path and then the NTFS practice 2017 is the directory that that the
image was sitting in but in this case this is the output directory that I used
was find NTFS practice 2017 you can see that there because I ran several
different images or scanners against this image so I would create different
output directories they would be fine they would be alert you know they would
be whatever tool I was working with and they would be sub directories within the
NTFS practice 2017 directory so so then on the command line you can choose the
option switch uppercase F and you give it the location and the path of that
custom lexicon file that you want to use for your regular expression or or alert
list and you can see here I used the e pad has a faculty lexicon and that’s the
one that I used it was a I think it’s a 42
term lexicon and I chose that for its brevity because there was a sensitive
lexicon that had over 850 terms and I wasn’t quite sure how long it would take
to process that but that’s that’s why I chose the smaller the smaller file to
work with and then of course then you finally want to input the source that
it’s supposed to to scan on so here you can see I used the NTFS practice 2017
go1 image file so next let’s see this is kind of the meat of what I found and and
it’s pretty significant and the fines and this is how the fine scanner verses
the light grip scanner works and how bulk extractor is set up is there the
newer version and I’m not quite sure which version has changed in but the
newer version includes a scanner called light grep and it’s a significant step
up in terms of processing efficiency from the fine scanner and I’ve I’ve
noted here and giving you you know a description of how those efficiencies
work and essentially the light grip scanner what it does is it takes that
group of terms that’s in that custom lexicon file and then it grabs a process
and chunk from the disk and it looks for all those terms in that processing chunk
and then it just goes through the disk chunk by chunk looking for terms whereas
the find scanner is much less efficient and it will take one term and it will go
through the entire disk image to look for that one term and it’ll take the
next term and it’ll go through the entire disk image to look for that term
so you can imagine if you have an 853 term lexicon file that you would go over
a single disick disk image 853 times and that’s not very efficient so you know
figuring out whether or not the lightgrip scanner
was working or running was my most important focus here because I’m like
well I want to use the most important engine here so I’ve included some links
to some further detail on the lightgrip scanner and I also want to note that
these scanners work in conjunction with the custom general options that you use
so if I am giving bulk extractor the custom lexicon file and I’m using that
find regex option to point to that custom lexicon file and I’m saying go
out to the disk and find all of these terms it’s going to work in conjunction
with the scanners that I’ve selected on that image file that Michael showed you
of the bulk extractor image it it it gives you that list of scanners and so I
created a slide here to show you there are differences in the scanners between
the bulk extractor versions so if you’re thinking of running this and you want to
run the light grep scanner you need to make sure that you have the 1.6 – dev
bulk extractor version because that’s what I was running and that’s what had
the light grip scanner so I included this screenshot to show you that on the
previous screen shot those scanners were actually missing and and that’s on the
right and then on the left you can see where they’re included this screenshot
is is from my installation and it shows some additional scanners that can also
be used that I won’t get off into the detail about what those are but
essentially making sure that the lightgrip scanner is installed means
that bulk extractor will use the light grep scanner or engine instead of the
fine scanner which is which is what you want so here’s the
results or the start of the scanning that I did and one of the things I was
concerned about was you know how am I going to know if my CPU is overheating
or my CPU is is is maxed out is it going to hang you know am I going to run into
problems I had to keep an eye on what was going on with my hardware and so I
installed a program called open hardware monitor and as you can see here I was
closely paying attention to my temperatures and how much of a load my
CPU was running at and um you know how how much further I thought that the
process could go and making sure that that it wasn’t getting hung up or
overheating anywhere so here the finish of the run it’ll only
took 12 minutes to run through the hard drive but it was really confusing
because it says that it only processed 272 megabytes of 524 megabytes source
image so I’m not quite sure what happened there didn’t have a chance to
go back and kind of examine those results and see why that happened what
it was processing what it wasn’t but I wanted to present that and let you know
that that was the first thing that I looked at you know this was the feedback
that the bulk extractor run gave me by running those default scanners against
the faculty lexicon using that use find regular expression option within bulk
extractor so I also ran an alert feature file scan and it only returned one term
using that same faculty lexicon and again I didn’t have an opportunity to
explore that and understand why that happened and I believe it is context
sensitive but I need to spend some more time understanding why the
why that happened and understanding how that that particular piece of of the
tool works so here are the results between the find and the light grep tool
engine if you will and what you can see here is that the term service actually
had two different results and one more time I didn’t have an opportunity to
spend time figuring out why that did my first initial question was you know it
was were these engines treating the encoding differently were these two
terms located within the disk image where they encoded utf-8 or utf-16 or
some other format that that you know caused it to treat it differently and
counted or not count it that’s was a little bit more time would require a
little bit more time than I had for this project so so with that I just want to
turn it back over to Sam and Michael and open it up for questions and you know
hopefully I didn’t overwhelm you with technical information but it was quite
technical what I was trying to do so I hope that that this was useful for you
thank you great thanks so much sandy and Michael
that was really helpful I hope I hope for the people on the call especially
those that maybe you’re just getting started with poke extractor or maybe use
it a couple times that maybe was a little more deeper bit of information I
thought it was particularly helpful sandy to to your
decision to be monitoring Hardware while the tool is running I think for you know
either for virtual setups or even those folks that are running it on you know a
dedicated abuti Linux install especially for probably larger larger data sets
being able to get some sort of window into what’s happening with the hardware
what tools processing will probably you know provide some useful benchmarks
going forward so that for me was particularly useful helpful but enough
about me let’s open it up for for questions from from other folks either
for for Sandy or Michael in terms of what they just went over anybody got
anything burning either in the chat or just on your mics hey this Walker Oh somebody typed in as
well go for Walker okay all right
I just wanted to ask Michael you had mentioned kind of trying to track our
organized Bowl constructor outputs to your existing security ité sort of
security classifications yes but we’ve got something similar here and I was
just curious if if you all have worked directly with with your IT on this to
see if what they think of bowl constructor as a tool that could align
outputs to those categories are if anybody from from the IT team has sort
of given their take on the on the bulkheads director tool in general it’s
my mic on can everyone hear me yep yeah great yeah no that’s a really good
question Walker actually what’s interesting is is
we’ve actually got a there’s an effort at Stanford right now where they’ve
created a tiger team at the Internet Security office of which I’ve joined and
we have just started those discussions of whether or not bulk extractor is a
tool that that could potentially be used I think one of the one of the
interesting one of the really interesting things that I’ve come across
and I was tempted to actually put in a slide to this effect was what are our
ISO office is currently using his identity finder and I’m really struck
actually using that particular piece of software versus using bulk extractor
just how customizable block extractor is and how flexible and that’s really
useful in some respects it’s also a real challenge as well as we’ve uncovered
with some of these runs that sandy has been has been doing for us just the
number of features and customizable and Inter
into how the different features work together makes it a bit of a challenge
so what I I’m actually fairly hopeful but once again I really think that we
need to get a better grab group better better understanding of some of these
radicals and what’s causing them and actually a little bit more figure out
how the tool is is actually generating the results that it is yeah
this is this is Callie at UNC I just wanted to point out we had similar
experience at UNC we also use identity finder and I think one of the things is
the question of whether the institution values consistency over efficiency right
because running bulk extractor against files just anecdotally on my own machine
because I also had to write identity finder it was both more effective and
profoundly faster but part of that was because the way identity finder has been
rolled out in our university allows us to not change any of the options they
wanted to be consistently run on everybody’s machines right and so you
know it could very well be that some institutions simply don’t want that kind
of flexibility when it comes to running across all the machines and they’re in
their enterprise because they wanted to just they want to have a reportable
result that they know was run with the exact same configuration so essentially
like if all the options are grayed out to you and identify identity finder and
it just runs the defaults it’s incredibly inefficient but that’s kind
of what a lot of institutions probably want so I’ll just throw Tim’s question there
unless Tim you want to shout it out and maybe you’re not able to in your current
audio situation but Tim’s question was about you know recognizing the you guys
aren’t using in production but just any sort of plans for what will happen when
you do identify some pis or what may what are some next steps in your your
workflow or your process or what you’re considering is next steps yeah I think
we need to we actually need to figure out we need to present some of these
some of these results with known collections to our digital archivists
and our head of manuscripts and actually verify that you know these are things
that they really want to know and the level of detail is appropriate as you
can see when you actually look at some of the results they can be a little bit
overwhelming so we want to verify that first and number one you know is there
certain configurations that we’re running a certain alerts and/or scanners
that that that are more useful than others that we would want to run against
a bulk a bulk number of collections my particular goal in this is to actually
generate audit reports once we figure out what these basic sets of scans that
we want to run would be to have it actually run against everything and
generate audit reports it’s another question that we haven’t talked about
yet is what do you do when you get all these results ultimately you know
there’s there’s got to be some sort of human process to be able to do an
evaluation of what those results are but I still think there’s really value in
being able to come up with a set of scans that you would want to run against
everything and then have that that audit trail in existence there’s value in that
so I hope that answers your question other questions or feedback or responses so one other one of there just to kind
of circle back to that that question one of the things that at least my goal is
to get away from is to we’re a little bit piecemeal and how we’re running
these scans right now and I think that’s why I actually am hopeful that we’ll be
able to create kind of some default configurations and scans that we run
against everything rather than having to turn our digital archivist loose that he
scanned this did he not scan that want to kind of get away from that and try to
use some automated processes to leverage the flexibility of the tool but also to
get some of the manual efforts out of the out of the loop if at all possible
because I could pick and see great value and scripting it and just everything
that’s loaded on a particular file store would be automatically scanned in the
background that would be really really useful yeah that makes a lot of sense I
guess I have a question for those on the call it’d be it’d be interesting to hear
just just sort of discussion purposes if anybody else is already either
experimented with using you know alert lists or stop lists or even some regular
expression formations and if you know even if you’ve just an ending trial
cases if you run into some of the same issues that sandy did or if you know any
I guess anything to add on on other other experiences in this regard and
maybe you’re on this call because you haven’t this webinar because you haven’t
done that but just just curious to hear if anybody’s had and think they’d like
to to share on their experience you you so Tim puts in the chat maybe you’ll see
those Easy’s rather regular expressions to find Canadian Social Insurance
numbers and Tim was that because the default
yeah the default scanner was just looking for the US based what syntax for
Social Security numbers interesting default sort of parameters right right
different numbers in different places you know I actually had a question for
Tim I as part of this whole process I actually installed and ran Tim Walsh
with CCA tools just as an exercise and I was interested in figuring out how to
plug in the custom regex usage the custom lexicon with the CCA
tools have have you done that Tim do you do that yeah chance of Tim oh he said ok
ok all right well because I do you think it would be possible to do that I
haven’t explored but I’m not opposed to figuring it out ok great yeah I’m interested in looking at that
and figuring out how to do that because making those custom lexicons are part of
that that default scan and CCA tools would be really really kind of cool
adding some extra command-line options right right finding the right place to
stick that in yep thanks Tim awesome cool ok
project ideas spinning of goal webinar reached now this is just many other
goals were reached as well by other other thoughts questions feedback harebrained ideas you
Jenny buddy learn anything new you can just raise your hand you could do that
useless no people raise some hands Walker raise his hand I actually have
one more question Sam this is sheriff yes go for it so I know that in the past
you guys over at Stanford have used ftk and your workflows there and I don’t
know if that’s still kind of you know one of the main how prominently about
that factors into your workflows currently but I’m wondering if you guys
have any thoughts about either kind of bulk extractor versus ftk maybe
advantages you see you already talked about some of the advantages over
identity finder in particular but I guess if you just if you have any
thoughts about how you might use this instead of in tandem with etc FTK yeah
this is this is Michael that’s a really good question sure I think one of the
things that we’ve actually discovered when Sandy and I were really trying to
kind of run this against real-world collections as we realized that we have
not been as consistent as perhaps we should have been on what tools we used
and what what types of disk images we’ve been generating which is a little bit
concerning particularly when we ran into the issue with the 81 I think that
that’s one thing that that’s one of recently our were in the process of
hiring a new digital archivists to kind of help Peter Chan out and I think one
of the hopes that we have is that we can actually systematize the tools we’re
using and how we’re using them and document that a lot better just because
there are inconsistencies that the crop up depending on what tool you’re using
I mean ftk imager I understand why why some folks really like it and it’s just
but it’s one more tool in the toolbox and if
you’re gonna use any sort of tool you need to document what you’re using it
and and what your outputs are because you might have to go back at some point
hope not but these this can in consistencies with the eighty ones that
we’ve been generating are concerning so I hope that’s useful yeah definitely Thanks so in other words
don’t do what we did lessons learned any other questions discussion ideas I’ve gotten one Sam that kind of came
out of this sort of out of the this investigation kind of work that that
sandy in particular and I have been working on I I think that there’s
definitely a greater need out there for and and maybe this is already happening
and we’re just not plugged into the right channel but you know whether it’s
a kind of a working group or a group of people that that are really kind of
running are discussing sort of these outputs and
these issues that come out these little rabbit holes and inconsistencies
I think there’s it would be useful to have more than just one institution kind
of exploring and playing with those sorts of things I know everyone’s super
busy but it’s just an idea that that’s kind of popped up based on what happened now it’s a
a great a great point Michael and I think getting into Tim’s first question
about sort of you know when you find PII ID and then what do you do I mean I
think that’s also just you know what when you find these these questions that
come up in the results what are what are the the next steps in terms of you know
for either further investigations or making a decision around the resources
needed for that additional investigation so making a sort of cost-benefit
analysis I think that’s a topic that would be really worthwhile for a working
group you know either within the VCC or collaboration with other groups yeah I’d
be great to – you know maybe bring this to a monthly call as a topic I think we
could sort of extend that into that forum to see what others others might
have insights about you know or just you know find a way to start you know
figuring out a way to sort of share some of these these outputs in a way that
people are comfortable with to maybe get some other eyes on or at least just sort
of start to characterize some of these scenarios so yeah I think it’s good idea
maybe if there’s others that are interested we could again sort of think
about bringing this at least – and what they call it at least continue to
discussion I I had a question I am you know in
going through all of this with Michael and and raising that’s the topic of well
when you create a disk image would you know with how do you choose which format
to use on the FAQ is on the curator website I noticed that there was a
mention that the advanced forensics format had an issue with NTFS
partitioned and partitions and scanning them and there were known issues with
that and I don’t know how current that information was and I wanted to know if
there was any resources that somebody could point me to that maybe had some
more current information regarding the advanced forensics format in bulk
extractor so this is this is Callaghan essentially AFF which was first
initiated by something Garfinkel and then picked up by some colleagues
several years ago was essentially just abandoned very publicly some say
Garfinkel made a statement that people should it should be deprecated and
people shouldn’t use it anymore because the notion was that the expert witness
format had been reverse-engineered and there was free software out there to
parse it and deal with it so essentially the you know any future trajectory of
hoping that things get addressed with AFF that haven’t already been addressed
are essentially in the ff4 initiative which is still ongoing which is to
essentially have this next generation AFF format on the future of it is still
a bit uncertain they regularly report on it at the research conferences but you
know it was kind of a painful process for us because for a lot of
philosophical reasons we wanted to advocate for AFF when we were off
teaching people about these things but when essentially the people behind the
primary open-source tools that we rely on decide that they’re just not going to
support it anymore we really didn’t have much choice but to
essentially let people know that probably either raw or expert witness
format was the right choice it doesn’t mean that anybody who chose AFF is in
such a terrible position because you could always either convert that AFF
given the available libraries either text for
witness format or too raw but essentially it’s kind of a lost cause to
hope for additional efforts to address existing limitations of AFF itself
because it’s not being actively developed okay thank you so at this point we got just a few
minutes left so I guess I’ll just make another call for for any any last
questions for for Sandy or Michael before we close
or for anybody else like this okay well then please join me and again and
thanking Michael and Sandee for what clearly is know taking a good a good
amount of effort to both conduct these investigations and then put together a
really excellent overview instead of topics I think a lot of people are
exploring these issues maybe you guys are just a little further along so if
this helps spur other folks maybe to take take the dive themselves and I
think I think that’s a super super outcome and yeah I’m gonna pick up these
ideas around continue to discussion on next steps and we’ll take things from
there so thanks again to everybody and hope everybody has a nice weekend thanks
everyone great thank you hey Carol

You May Also Like

About the Author: Oren Garnes

Leave a Reply

Your email address will not be published. Required fields are marked *