Autopsy Sections 5-7
___
Notes (5/25/20)
Section 5: Analyzing Data Sources
Outline the basic approach that is taken to analyzing the data sources in the case
● Ingest modules
● File prioritization
● Ingest filters
What are ingest modules?
● Plugins responsible for analyzing the data on the drive
○ Hashing, keyword search, web activity, registry analysis, file type identification,
extension mismatch, etc.
○ Many applications online are ingest modules that work in different ways, 3rd
party or otherwise
2 types of ingest modules
File ingest modules
● MD5/SHA1 Hash calc, Hash hookup, EXIF extraction, add text to keyword index, etc.
● All files out of disk will go through pipeline of ingest modules in specific order to
analyze file
● Can see inside .zip files, unlike data source ingest because run during zipping
● Are run on every file in data source unless ‘ingest filters’ (cause files to not go through
pipeline) are applied
● Many can run in parallel
Data source ingest modules
● Given reference to data source itself instead
● Web browser analysis → registry analysis
○ Web browser can be done because we know what databases to look for
● Know that they’re looking for a specific file with certain path
● Some ingest modules when configured by the user will be file ingest, which will run until
all files scanned, long time; others will be data source modules will only take couple
minutes
File prioritization
● Ingest manager runs in background and schedules files for analysis
● Files go down pipeline based on priority: user folders, program files/other root folders,
Windows folder, unallocated space
● If 2nd image + while 1st is still ingested→ parallel
○ Files from second image could be highest priority as #1 goes further down in
priority, will eventually be in sync
○ If you want #1 to finish first, wait for it to finish before adding second image
Data source-level modules
● run on entire data source
● Better for more focused analysis techniques
○ I.e query database for set of files, analyze the set of files, post results to black
board
● Against file-level modules
○ Review every file to determine if relevant, analyze relevant files, post results on
blackboard; will come and go faster
Running ingest modules
● #1: after data source +, prompted to add ingest modules
● #2: right click on data source to run ingest modules
Settings
● May require config
● Basic options may exist when enabling module
● More detailed config from “global settings” or tools → options
Unallocated Space
● Can choose to analyze the unalloc space files; go down last through pipelines
○ may/may not have interesting data; can even choose to go down pipelines or not
■ Pulldown option for unalloc space and other filters
Ingest filters
● Allow you to only allow certain types of files down pipelines
○ I.e only jpg/png files or only in Desktop folder
● Useful for triage/preview to not waste time in analyzing; made in “options” panel
“Official” ingest modules
● Includes modules from Autopsy; hash lookup, keyword search, embedded file
extraction, recent activity, email, etc. (bulk of course)
Blackboards artifacts
● Ingest modules will save their results as Blackboard Artifacts
● Artifacts have a types and one or more attributes
○ Types can be web bookmark, hash hit, encryption detected, etc.
● Autopsy comes with many predefined types and modules can make their own
Attributes
● Type and value pair
● Examples for web bookmark artifact: URL www.autopsy.com, DATETIME: April 1, 2020
● Modules can choose attributes
Viewing
● Shown in tree under “extracted content”, “results” content viewer
● Included in reports
Section 6: Hash Lookup Module
Outline use and config of Hash lookup ingest module
What does it do?
● Calculate MD5 hash of files
● Stores hash in case db
● Looks hash up in hash set
● Marks file as Known (NSRL) -- could be good or bad OR known bad/notable
Why should you use it?
● To include MD5 hash values in your reports
● Make ingest faster and skip known files w/ NSRL
○ 18 min vs. 9 min in recent test
● Hide known files from UI
● Identify notable/known bad files
● Keep C. Repo up to data and correlate with past cases; matches cases and looks at past
cases
Hash Calculation Step
● Skips the unalloc space files
● Calculates MD5 of file content
● Stores hash in the case database
● If a file already has a hash, it does not calculate it again
Hash Set Lookup Step
● Looks up MD5 in all of configured hash sets
○ Does not stop at first hit
● Supported hash sets
○ NIST NSRL; flagged as ‘KNOWN’, EnCase, Sleuth SQLite format (.kdb files),
md5sum, hashkeeper
Known Status of a File
● Every file has ‘known status’: Notable/Known Bad, Known, Unknown -- default
What can you configure
● Databases to use & always calculate hash
Where do you see the results?
● ‘Known’ files may be ignored by other modules (module’s choice)/ hidden from ‘views’
area (default to hide, user option)
● Can be hidden in directory hierarchy (user option, default is to not hide them
● Notable hits are shown in “hashset hits” section of the tree (organized by hash
name)/message in the Ingest inbox
Example 1
● Keyword search module ignores ‘known’ files; search for term “windows”
● Without NSRL has 6330 hits, With only has 2311 hits
● File type view ignores known files; 26 w/o NRSL and 6 with in Office
Analysis Shortcut
● Only know a small number of notable files; may be clues/good start
● Typically more ‘not yet known notable’ files near ‘known notable file’
● Right click will bring you there
Configuring Hashsets
● Open hash lookup module settings
○ “Tools -> options” Menu
○ “Global settings” button from ingest panel
● Buttons at bottom to import/create new set
Importing a hash set: indexes
● Most hash sets aren’t sorted
○ Makes searching very slow, could be out of # order for MD5 values and will have
to go through each
● Import a hashset not sorted → “index”/sorted version
○ Seperate file named with “-md5.idx” at end, will be in order (0, 1, 2, etc.)
○ Will do binary search; starts in middle of set and will know which range to look
at, ignores irrelevant numbers and half of files each time
Destination
● Can be stored locally or remotely
○ Local: only single computer has access
○ Remotely: on Central repo and all users have access
● C. Repo allows information stored about where hash set came from
Importing a Hash set
● Browse to hash set/index (.idx file), pick destination (local/central)
● Edit display name and verify type
● Disable inbox messages if set has lots of false positives
● If db does not have index, will need to press “index” button; can take a while
Miscellaneous
● After index is created, copy -md5.idx file for co-worker so they don’t need to/use remote
hash set
● Index for NSRL can be downloaded from sleuthkit.org (much easier)
● If hash set is overwritten, re-index to get updates; new version of shared one
Create a Hash set
● Create new local/remote set
○ Local: SQlite db format; remote through C. Repo
● Create w/ options panel, specify location, name, type
● Add hash/files by right clicking and “add file to hash set”; can only write to Autopsy sets
or create a new one
Adding hashes: copy and paste
● Add list of values to existing set through “add hashes to hash set” from options panel
○ Copy and paste one or more MD5 hashes
End of case
● some examiners maintain hash set of all previously tagged files
● 1) update local hash set at end of case using a report module
● 2) enable c. repo for auto update when files are tagged as notable
Section 5: Lab Notes/Answers
1. Hash lookup
2. File type identification
3. Extension mismatch detection
4. Embedded File extractor
5. Exif Parser
6. Email Parser
7. Correlation Machine all have to be on
1. By 15%, the total number of hits under “Hash hits” after running Hash Lookup Ingest
module is 6.
2. The files are RN.jpg and f_000239
3. There are 7 jpg files where the notable hash hit was found.
Section 7: The “Simple” Modules (File type ID, Extension Mismatch, Exif,
etc.)
Outline use of several small, focused modules; File type identification, extension mismatch,
exif, interesting files, encryption detection, etc
File type module
What does it do?
● Determines file type based on signatures
○ Identifies JPEG based on file starting with 0xffd8
● Stores type in database for other modules to use
Why would you use it?
● More accurately identify type of file
● Many nodules depend on it
● No configuration; either on or off
Details
● File-level ingest module
● Uses Tika open source library
● Detects hundreds of file types and reports them as a MIME type
○ application/zip, audio/mpeg, image/jpeg, application/octet-stream(unknown
type)
● Results can be found in file metadata, reports, other places
Custom file types
● Can define your own file types is not auto detected
● Tools → options → file types
● Specify MIME type (can make one up), offset of signature, signature (bytes or ASCII
string), if you want to be alerted if signature is found
File Extension Mismatch Module
What does it do?
● Compare file’s extension and file type
● Flags if extension is not associated with
Why would you use it?
● Detect files that someone may be trying to hide
● Results is in results tree extension mismatch detection → positives
Configuration
● File types to focus on, skip known files (default: true), list of extension for each file type
● Can go to config panel to change things through “global options” or Tools to change file
extensions
○ Click on “new extension”to add extension or “new type” for type (will tell you if
type can’t be detected)
Exif Model
What does it do?
● Extracts Exif structure from JPEG, stores metadata in blackboard
● No config
Why would you use it?
● Identify camera type used to take picture
● Time picture was taken, geo-coordinates where it is taken
● Results in results tree with all of the pictures; not a general purpose exif tool, only core
things
Embedded File Extractor
What does it do?
● extracts embedded files so they can be analyzed
● Opens ZIP, RAR, other archive files
● Used to analyze all files on system
● No config
Details
● Extracted files are saved inside of case folder
● Added back to ingest pipelines for analysis
○ Same priority as parent file
● Flagged if it was password protected
○ Right click to supply password and open
● Results are in tree like other files
Email module
What does it do?
● Searches for MBOX, PST, and EML files
● Adds email artifacts to blackboard
● Adds attachments as children of the messages; groups them into threads
● Config: none
● Results are in communications viewer/tree
Why should you use it?
● Identify email based communications
Interesting Files module
What does it do?
● Flags files and folders that you think are interesting
Why should you use it?
● Always alert and notify when files are found
○ Iphone backups, VMware images, BitCoin wallets, cloud storage clients
Config
● Need to make rules in programs
○ Rules organized in options panel
● Autopsy out of the box comes with no rules
● Rules are grouped into sets (name required)
Rules
● Have several optional fields
● Type: match files, directories, or both
● Name: can be full, only extension, or substring/regular expression
● Parent path: can be a substring (i.e “/Desktop/”)
● Size, MIME Type, dates
Example
● Set name VMWare
● Rule 1
○ Files, full name - vmplayer.exe, name -- program EXE
● Rule 2
○ Type: files, extension- vmdk, name: VMDK file
Encryption Detection Module
● What does it do: flags files and volumes that are or could encrypted
● Why use it: ensure that you are aware of files that may contain additions evidence
● Detects
○ Password on Office docs and Access DBs
○ Possible encrypted files or volumes: high entropy (random), multiple of 512
bytes, no known file type to see
● Results are shown as “detected” vs. “suspected”
Plaso Module
● What does it do: uses open source Plaso tool to parse various logs and file types to
extract out time stamps
● Why use it: extract as many dates as possible for timeline
● Notes
○ Some time stamps duplicate what is already extracted
○ Can take considerable amount of time to run, disabled by default
Config
● Some features were disabled because they were slow
● Can enable them for those time stamps:
○ Registry timestamps, PE headers (executables; will take even longer as this is
already longest running module)
Virtual Machine Extractor
● What does it do: analyzes virtual machines found in a source
○ Detects vmdk and vhdi files in a data source
○ Makesa local copy of them → back in as data sources
● Why use it: to process virtual machines that the device’s owner may have also used; may
contain evidence
● No configuration; seeing results → new data sources will be added
Data Source Integrity
● What does it do: Validates and calculates hash of disk image
● Why use it: Ensure integrity of evidence
● What does it do:
○ Retrieves hash from E01 or what user entered when data source was added
○ Calculates current hash
○ Generates alert if they are different
○ Saves hash if there was not one already
● Failures are shown in the tree
Section 7 Lab
1. There were this # of photos taken from the following devices:
a. BLU R1 HD -- 15
b. iPhone 7 Plus -- 1
c. Samsung Galaxy S8 -- none?
2. MIME type listed for D3D11_Default.shader-db.bin is application/octet-stream
3. The file size for ^^ is 594728 kb
4. There are extension mismatch results
5. Some common file types with unexpected extensions include image/png and
application/x-msoff
6. Only veracrypt was found on the system, not truecrypt