An Elf parser.
Parser for ELF object format
melf
A Haskell library to parse/serialize Executable and Linkable Format (ELF) files.
Parsing the header and table entries
Module Data.Elf.Headers
implements parsing and serialization of the ELF file header and the entries of section and segment tables.
ELF files come in two flavors: 64-bit and 32-bit. To differentiate between them type ElfClass
is defined:
data ElfClass
= ELFCLASS32 -- ^ 32-bit ELF format
| ELFCLASS64 -- ^ 64-bit ELF format
deriving (Eq, Show)
Singleton types for ElfClass
are also defined:
-- | Singletons for ElfClass
data SingElfClass :: ElfClass -> Type where
SELFCLASS32 :: SingElfClass 'ELFCLASS32 -- ^ Singleton for `ELFCLASS32`
SELFCLASS64 :: SingElfClass 'ELFCLASS64 -- ^ Singleton for `ELFCLASS64`
Some fields of the header and table entries have different bitwidth for 64-bit and 32-bit files. So the type WordXX a
was borrowed from the data-elf
package:
-- | @SingElfClassI a@ is defined for each constructor of `ElfClass`.
-- It defines @WordXX a@, which is `Word32` for `ELFCLASS32` and `Word64` for `ELFCLASS64`.
-- Also it defines singletons for each of the `ElfClass` type.
class ( Typeable c
, Typeable (WordXX c)
, Data (WordXX c)
, Show (WordXX c)
, Read (WordXX c)
, Eq (WordXX c)
, Ord (WordXX c)
, Bounded (WordXX c)
, Enum (WordXX c)
, Num (WordXX c)
, Integral (WordXX c)
, Real (WordXX c)
, Bits (WordXX c)
, FiniteBits (WordXX c)
, Binary (Be (WordXX c))
, Binary (Le (WordXX c))
) => SingElfClassI (c :: ElfClass) where
type WordXX c = r | r -> c
singElfClass :: SingElfClass c
instance SingElfClassI 'ELFCLASS32 where
type WordXX 'ELFCLASS32 = Word32
singElfClass = SELFCLASS32
instance SingElfClassI 'ELFCLASS64 where
type WordXX 'ELFCLASS64 = Word64
singElfClass = SELFCLASS64
The header of the ELF file is represented with the type HeaderXX a
:
-- | Parsed ELF header
data HeaderXX c =
HeaderXX
{ hData :: ElfData -- ^ Data encoding (big- or little-endian)
, hOSABI :: ElfOSABI -- ^ OS/ABI identification
, hABIVersion :: Word8 -- ^ ABI version
, hType :: ElfType -- ^ Object file type
, hMachine :: ElfMachine -- ^ Machine type
, hEntry :: WordXX c -- ^ Entry point address
, hPhOff :: WordXX c -- ^ Program header offset
, hShOff :: WordXX c -- ^ Section header offset
, hFlags :: Word32 -- ^ Processor-specific flags
, hPhEntSize :: Word16 -- ^ Size of program header entry
, hPhNum :: Word16 -- ^ Number of program header entries
, hShEntSize :: Word16 -- ^ Size of section header entry
, hShNum :: Word16 -- ^ Number of section header entries
, hShStrNdx :: ElfSectionIndex -- ^ Section name string table index
}
So we have two types HeaderXX 'ELFCLASS64
and HeaderXX 'ELFCLASS32
. To be able to work with headers uniformly the type Header
was introduced:
-- | Header is a sigma type where the first entry defines the type of the second one
data Header = forall a . Header (SingElfClass a) (HeaderXX a)
Header
is a pair. The first element is an object of the type SingElfClass
defining the width of the word. The second element is HeaderXX
parametrized with the first element (i. e. Σ-type from the languages with dependent types).
Header
is an instance of the Binary
class.
So given a lazy bytestring containing large enough initial part of ELF file one can get the header of that file with a function like this:
withHeader :: BSL.ByteString ->
(forall a . SingElfClassI a => HeaderXX a -> b) -> Either String b
withHeader bs f =
case decodeOrFail bs of
Left (_, _, err) -> Left err
Right (_, _, (Header sing hxx)) -> Right $ withSingElfClassI sing f hxx
The function decodeOrFail
is defined in the package binary
. The function withSingElfClassI
creates a context with an implicit word width available and looks like withSingI
:
-- | Convenience function for creating a context with an implicit singleton available.
withSingElfClassI :: SingElfClass c -> (SingElfClassI c => r) -> r
withSingElfClassI SELFCLASS64 x = x
withSingElfClassI SELFCLASS32 x = x
The module Data.Elf.Headers
also defines the types SectionXX
, SegmentXX
and SymbolXX
for the elements of section, segment and symbol tables.
Parsing the whole ELF file
The module Data.Elf
implements parsing and serialization of the whole ELF files. To parse ELF file it reads ELF header, section table and segment table and uses that data to create a list of type ElfListXX
of elements of the type ElfXX
representing the recursive structure of the ELF file. It also restores section names from the the string table indexes. That results in creating an object of type Elf
:
-- | `Elf` is a forrest of trees of type `ElfXX`.
-- Trees are composed of `ElfXX` nodes, `ElfSegment` can contain subtrees
data ElfNodeType = Header | SectionTable | SegmentTable | Section | Segment | RawData | RawAlign
-- | List of ELF nodes.
data ElfListXX c where
ElfListCons :: ElfXX t c -> ElfListXX c -> ElfListXX c
ElfListNull :: ElfListXX c
-- | Elf is a sigma type where the first entry defines the type of the second one
data Elf = forall a . Elf (SingElfClass a) (ElfListXX a)
-- | Section data may contain a string table.
-- If a section contains a string table with section names, the data
-- for such a section is generated and `esData` should contain `ElfSectionDataStringTable`
data ElfSectionData c
= ElfSectionData -- ^ Regular section data
{ esdData :: BSL.ByteString -- ^ The content of the section
}
| ElfSectionDataStringTable -- ^ Section data will be generated from section names
| ElfSectionDataNoBits -- ^ SHT_NOBITS uninitialized section data: section has size but no content
{ esdSize :: WordXX c -- ^ Size of the section
}
-- | The type of node that defines Elf structure.
data ElfXX t c where
ElfHeader ::
{ ehData :: ElfData -- ^ Data encoding (big- or little-endian)
, ehOSABI :: ElfOSABI -- ^ OS/ABI identification
, ehABIVersion :: Word8 -- ^ ABI version
, ehType :: ElfType -- ^ Object file type
, ehMachine :: ElfMachine -- ^ Machine type
, ehEntry :: WordXX c -- ^ Entry point address
, ehFlags :: Word32 -- ^ Processor-specific flags
} -> ElfXX 'Header c
ElfSectionTable :: ElfXX 'SectionTable c
ElfSegmentTable :: ElfXX 'SegmentTable c
ElfSection ::
{ esName :: String -- ^ Section name (NB: string, not offset in the string table)
, esType :: ElfSectionType -- ^ Section type
, esFlags :: ElfSectionFlag -- ^ Section attributes
, esAddr :: WordXX c -- ^ Virtual address in memory
, esAddrAlign :: WordXX c -- ^ Address alignment boundary
, esEntSize :: WordXX c -- ^ Size of entries, if section has table
, esN :: ElfSectionIndex -- ^ Section number
, esInfo :: Word32 -- ^ Miscellaneous information
, esLink :: Word32 -- ^ Link to other section
, esData :: ElfSectionData c -- ^ The content of the section
} -> ElfXX 'Section c
ElfSegment ::
{ epType :: ElfSegmentType -- ^ Type of segment
, epFlags :: ElfSegmentFlag -- ^ Segment attributes
, epVirtAddr :: WordXX c -- ^ Virtual address in memory
, epPhysAddr :: WordXX c -- ^ Physical address
, epAddMemSize :: WordXX c -- ^ Add this amount of memory after the section when the section is loaded to memory by execution system.
-- Or, in other words this is how much `pMemSize` is bigger than `pFileSize`
, epAlign :: WordXX c -- ^ Alignment of segment
, epData :: ElfListXX c -- ^ Content of the segment
} -> ElfXX 'Segment c
-- | Some ELF files (some executables) don't bother to define
-- sections for linking and have just raw data in segments.
ElfRawData ::
{ edData :: BSL.ByteString -- ^ Raw data in ELF file
} -> ElfXX 'RawData c
-- | Align the next data in the ELF file.
-- The offset of the next data in the ELF file
-- will be the minimal @x@ such that
-- @x mod eaAlign == eaOffset mod eaAlign @
ElfRawAlign ::
{ eaOffset :: WordXX c -- ^ Align value
, eaAlign :: WordXX c -- ^ Align module
} -> ElfXX 'RawAlign c
infixr 9 ~:
-- | Helper for `ElfListCons`
(~:) :: ElfXX t a -> ElfListXX a -> ElfListXX a
(~:) = ElfListCons
Not each object of that type can be serialized.
Constructor
ElfSection
still has a section number. It is required as the symbol table and some other structures refer to the sections by theirs indexes. So the section indexes should be consecutive integers starting from 1. Section with index 0 is always empty and is created by the library.There should be a single
ElfHeader
. It should be the first nonempty node of the tree.If there exists at least one node
ElfSection
then there should exist exactly one nodeElfSectionTable
and exactly one section that hasElfSectionDataStringTable
as the value of itsesData
field (the string table for the names of sections).If there exists at least one node
ElfSegment
then there should exist exactly one nodeElfSegmentTable
.
Correctly composed ELF object can be serialized with the function serializeElf
and parsed with the function parseElf
:
serializeElf :: MonadThrow m => Elf -> m ByteString
parseElf :: MonadCatch m => ByteString -> m Elf
ELF
is not an instance of the class Binary
because PutM
is not an instance of the class MonadFail
.
Generation of object files
To create machine code that is used in the examples a pair of modules were created. The module AsmAArch64
provides a DSL embedded in Haskell. This DSL is a kind of assembler language for the AArch64 platform. It exports some primitives to generate machine instructions and organize machine code. It also exports function assemble
that consumes the monad composed of those primitives and produces an object of the type Elf
:
assemble :: MonadCatch m => StateT CodeState m () -> m Elf
The idea was inspired by the article "Monads to Machine Code" by Stephen Diehl. Detailed description of this module is available in russian: README_ru.md (outdated).
The module HelloWorld
uses primitives from AsmAArch64
to compose relocatable executable code that uses system calls to output a "Hello World!" message into standard output and exit:
helloWorld :: MonadCatch m => StateT CodeState m ()
Function assemble
uses the melf
library to generate an object file:
return $ Elf SELFCLASS64 $
ElfHeader
{ ehData = ELFDATA2LSB
, ehOSABI = ELFOSABI_SYSV
, ehABIVersion = 0
, ehType = ET_REL
, ehMachine = EM_AARCH64
, ehEntry = 0
, ehFlags = 0
}
~: ElfSection
{ esName = ".text"
, esType = SHT_PROGBITS
, esFlags = SHF_EXECINSTR .|. SHF_ALLOC
, esAddr = 0
, esAddrAlign = 8
, esEntSize = 0
, esN = textSecN
, esLink = 0
, esInfo = 0
, esData = ElfSectionData txt
}
~: ElfSection
{ esName = ".shstrtab"
, esType = SHT_STRTAB
, esFlags = 0
, esAddr = 0
, esAddrAlign = 1
, esEntSize = 0
, esN = shstrtabSecN
, esLink = 0
, esInfo = 0
, esData = ElfSectionDataStringTable
}
~: ElfSection
{ esName = ".symtab"
, esType = SHT_SYMTAB
, esFlags = 0
, esAddr = 0
, esAddrAlign = 8
, esEntSize = symbolTableEntrySize ELFCLASS64
, esN = symtabSecN
, esLink = fromIntegral strtabSecN
, esInfo = 1
, esData = ElfSectionData symbolTableData
}
~: ElfSection
{ esName = ".strtab"
, esType = SHT_STRTAB
, esFlags = 0
, esAddr = 0
, esAddrAlign = 1
, esEntSize = 0
, esN = strtabSecN
, esLink = 0
, esInfo = 0
, esData = ElfSectionData stringTableData
}
~: ElfSectionTable
~: ElfListNull
It runs the State
monad that was passed as an argument. As a result the final state of CodeState
includes all the data neсessary to produce ELF file, in particular:
txt
refers to the content of the.text
section,symbolTableData
refers to the content of the symbol table section,stringTableData
refers to the content of the string table section linked to the symbol table.
Names with SecN
suffixes (textSecN
, shstrtabSecN
, symtabSecN
, strtabSecN
) are predefined section numbers that conform to the conditions stated above.
For the sake of simplicity external symbol resolution and data section allocation were not implemented. It requires implementation of relocation tables. On the other hand, the resulting code is position-independent.
Use this module to produce object file and try to link it:
[nix-shell:examples]$ ghci
GHCi, version 8.10.7: https://www.haskell.org/ghc/ :? for help
Prelude> :l AsmAArch64.hs HelloWorld.hs
[1 of 2] Compiling AsmAArch64 ( AsmAArch64.hs, interpreted )
[2 of 2] Compiling HelloWorld ( HelloWorld.hs, interpreted )
Ok, two modules loaded.
*AsmAArch64> import HelloWorld
*AsmAArch64 HelloWorld> elf <- assemble helloWorld
*AsmAArch64 HelloWorld> bs <- serializeElf elf
*AsmAArch64 HelloWorld> BSL.writeFile "helloWorld.o" bs
*AsmAArch64 HelloWorld>
Leaving GHCi.
[nix-shell:examples]$ aarch64-unknown-linux-gnu-gcc -nostdlib helloWorld.o -o helloWorld
[nix-shell:examples]$
The linker accepted the object file. Try to run the result:
[nix-shell:examples]$ qemu-aarch64 helloWorld
Hello World!
[nix-shell:examples]$
It works.
Generation of executable files
The module DummyLd
uses the section .text
of object file to create an executable file. Code relocation and symbol resolution is not implemented so that procedure works only for position-independent code that does not refer to external translation units, for example, it works with the code described above.
Function dummyLd
consumes an object of the type Elf
and finds a section .text
(using elfFindSectionByName
) and header (using elfFindHeader
) in it. Then the header type is changed to ET_EXEC
, the address of the first executable instruction is specified and a loadable segment containing the header and the content of .text
is formed:
data MachineConfig a
= MachineConfig
{ mcAddress :: WordXX a -- ^ Virtual address of the executable segment
, mcAlign :: WordXX a -- ^ Required alignment of the executable segment
-- in physical memory (depends on max page size)
}
getMachineConfig :: (SingElfClassI a, MonadThrow m) => ElfMachine -> m (MachineConfig a)
getMachineConfig EM_AARCH64 = return $ MachineConfig 0x400000 0x10000
getMachineConfig EM_X86_64 = return $ MachineConfig 0x400000 0x1000
getMachineConfig _ = $chainedError "could not find machine config for this arch"
dummyLd' :: forall a m . (MonadThrow m, SingElfClassI a) => ElfListXX a -> m (ElfListXX a)
dummyLd' es = do
section' <- elfFindSectionByName es ".text"
txtSectionData <- case esData section' of
ElfSectionData textData -> return textData
_ -> $chainedError "could not find correct \".text\" section"
header' <- elfFindHeader es
MachineConfig { .. } <- getMachineConfig (ehMachine header')
return $
case header' of
ElfHeader { .. } ->
ElfSegment
{ epType = PT_LOAD
, epFlags = PF_X .|. PF_R
, epVirtAddr = mcAddress
, epPhysAddr = mcAddress
, epAddMemSize = 0
, epAlign = mcAlign
, epData =
ElfHeader
{ ehType = ET_EXEC
, ehEntry = mcAddress + headerSize (fromSing $ sing @a)
, ..
}
~: ElfRawData
{ edData = txtSectionData
}
~: ElfListNull
}
~: ElfSegmentTable
~: ElfListNull
-- | @dummyLd@ places the content of ".text" section of the input ELF
-- into the loadable segment of the resulting ELF.
-- This could work if there are no relocations or references to external symbols.
dummyLd :: MonadThrow m => Elf -> m Elf
dummyLd (Elf c l) = Elf c <$> withSingElfClassI c dummyLd' l
Try to use this code to produce executable file without GNU linker:
[nix-shell:examples]$ ghci
GHCi, version 8.10.7: https://www.haskell.org/ghc/ :? for help
Prelude> :l DummyLd.hs
[1 of 1] Compiling DummyLd ( DummyLd.hs, interpreted )
Ok, one module loaded.
*DummyLd> import Data.ByteString.Lazy as BSL
*DummyLd BSL> i <- BSL.readFile "helloWorld.o"
*DummyLd BSL> elf <- parseElf i
*DummyLd BSL> elf' <- dummyLd elf
*DummyLd BSL> o <- serializeElf elf'
*DummyLd BSL> BSL.writeFile "helloWorld2" o
*DummyLd BSL>
Leaving GHCi.
[nix-shell:examples]$ chmod +x helloWorld2
[nix-shell:examples]$ qemu-aarch64 helloWorld2
Hello World!
[nix-shell:examples]$
It works.
Related work
These just parse/serialize ELF header and table entries but not the whole ELF files.
History
For the early history look at the branch "amakarov" of the my copy of the elf repo.
Tests
Test data is committed with git-lfs. Only testdata/orig/* tests are included to hackage distributive to keep the tarball size small.
License
BSD 3-Clause License (c) Aleksey Makarov.