regf

Every Registry file starts with a 4,096 byte header block. The first 512 bytes of that header tell us about the Registry file as a whole. Contained within this header are the following:

Offset Length Type What is it?
x000 4 string Signature: “regf”
x004 4 uint32 Sequence Number 1
x008 4 uint32 Sequence Number 2
x00C 8 datetime Last Modified Date
x014 4 uint32 Major ver
x018 4 uint32 Minor ver
x01C 4 uint32 Type (0=Registry file; 1=Log file)
x020 4 uint32 Format (1=?)
x024 4 uint32 Offset to root key record
x028 4 uint32 Offset to first non-used block
x02C 4 uint32 always 1?
x030 64 string16 File name
0x70 8 ? ??? a
x078 8 ? ??? b
x080 8 ? ??? a
x088 8 ? ??? b
x090 4 ? ??? Value is either 0 or 1
x094 8 ? ??? a+1
x09C 8 ? ??? b
x0A4 4 string unknown: “rmtm”
0xA8 340 x00 padding
x1FC 4 uint32 Checksum (XOR32 of above)

 

Here’s what it would look like in a hex editor:

Offset 0 1 2 3 4 5 6 7 8 9 A B C D E F
000 72 65 67 66 88 56 0C 00 88 56 0C 00 74 92 42 75 regf·········`·,
signature seq# seq# Date
010 11 3C D0 01 01 00 00 00 05 00 00 00 00 00 00 00 ·<з············
maj ver min ver type (0/1)
020 01 00 00 00 20 00 00 00 00 F0 48 01 01 00 00 00 ·········p······
unknown (1) Offset root Offset end unknown (1)
030 53 00 59 00 53 00 54 00 45 00 4D 00 00 00 00 00 S·Y·S·T·E·M·····
name
040 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ················
name (cont)
050 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ················
name (cont)
060 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ················
name (cont)
070 CC 88 68 01 6F 6C DE 11 8D 1D 00 1E 0B CD E3 EC ̈h·olÞ······Íãì
unknown (a) unknown (b)
080 CC 88 68 01 6F 6C DE 11 8D 1D 00 1E 0B CD E3 EC ̈h·olÞ······Íãì
unknown (a) unknown (b)
090 01 00 00 00 CD 88 68 01 6F 6C DE 11 8D 1D 00 1E ͈h·olÞ·········
unknown (1) unknown (a+1) unknown (b)
0A0 0B CD E3 EC 72 6D 74 6D 00 00 00 00 00 00 00 00 ·Íãìrmtm········
rmtm padding
0B0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ················
padding (cont)(repeated lines removed)
1F0 00 00 00 00 00 00 00 00 00 00 00 00 21 62 DC 9C ············!bÜœ
padding (cont) checksum

 

Some interesting notes:

There are two sequence numbers at offsets x4 and x8. If everything is operating as planned, these numbers should match. I’m assuming a mismatch tells the system the hive was not unmounted properly. They appear to be incremented each time the file is mounted, so this could provide an interesting forensic artifact. Has the registry been modified since last boot? (has this number changed since last boot) How many times has each user profile logged into this system? (compare the numbers for all NTUSER.DATs) More testing is needed to prove if this will be reliable and useful, but it is interesting.

The last modified time at offset xc does appear to jive with file last modified time in most cases. On a live system I saw discrepancies, which I am sure are related to NTFS caching in RAM and not updating the metadata on disk until unmount. In all cases where I saw a difference, the time inside the file was current and the time in NTFS metadata was lagged. More testing is needed to see if there is a reliable story this artifact tells us.

The version number at offset x14 appears to be either 1.3 or 1.5 on Windows 7 systems. NTUSER.DAT, BCD-Template, COMPONENTS, SAM, and SECURITY are 1.3. DEFAULT, SOFTWARE, and SYSTEM are 1.5. Haven’t identified difference between the two version numbers, though.

The type at offset x1c identifies if this is an actual Registry file or a .log file. The Registry Logs are stored in the same regf format, but different.

The number at offset x20 has been 1 on every file I’ve looked at. Some documentation calls this field “format” and others call it “unknown”. The ones that call it format don’t explain what that means, so I’m going to go with unknown. There is also a number at x2c that is always 1 and is even less known.

The two offsets at offsets x24 and x2c are virtual locations within the Registry content. The file is made up of HBINs that are 4k and each bin has a “hbin” signature at its start. The starting point is the first bin, which starts after the 4k opening header block.

The name at offset x30 is in Unicode. Sometimes it contains a path, sometimes it is just a file name. When it does contain a path, the field isn’t long enough for full path, so it gets truncated. There really doesn’t seem to be much sense to it as a file and its multiple associated log files will have different data in this field – different capitalization, different starting points for the path, and other oddities.

Most of the documentation calls everything after x70 “reserved” or “unknown”. There is clearly a structure there, but I can’t tell what those numbers are referring to. The same two numbers get repeated three times, with the third iteration having the first number incremented by one. The string “rmtm” appears in there, which obviously means something. There is also a number between the second and third iteration that is sometimes o and sometimes 1, but I can’t tell what that is denoting.

Lastly there is a checksum at offset x1fc. Once source identified this as XOR32 of the preceding 508 bytes. I haven’t taken the time to get a xor32 hashing tool to verify that.

In LOG file, the next thing after this header at offset x513, is a the Dirty Vector. It starts with a signature of “DIRT”. What follows is a bitmap where every bit set to 1 indicates an hbin that has changed.

In regular Registry Files the rest of the first opening 4k block, 3584 bytes worth, is just x00 padding.