KARONTE: Detecting Insecure Multi-binary Interactions in Embedded Firmware 읽기

Posted by : at

Category : fuzzing   paper_study



0. 주안점

논문의 요지 → 1) 어떤 새로운 기술을 도입해 2) 이만큼의 향상된 결과를 냈다

  • 즉, 논문의 핵심은 → ‘향상된 결과’를 낼 수 있었던 원인이 ‘어떤 기술’ 인가
  • 추가로, ‘이만큼 향상됐음’의 근거는 무엇인가

이번 논문 과제의 의도는 연구실이 기반하고 있는 연구 분야가 무엇인지 이해하는 것이라고 생각

따라서 ‘기술’에 주안점을 두고 논문 리딩 진행

💡논문에서 내가 얻어야 할 것은?

→ 향상된 결과의 기반이 되었던 기술적 발전은 무엇인가?

  • multiple binaries analysis + inter-binary data flow analysis

→ 그것은 어떤 아이디어에 기반하였는가?

  • Modeling &Tracking multi-binary interactions
  • IPC paradigm used in one firmware is likely to be finite(concept)
    • 한 바이너리의 data I/O 파악(mid-implementation) → inter-binary data flow recovery
  • by leveraging CPF(Communication Paradigm Finder) modules(Implementation)


시행착오

  • Appendix가 기술set 설명과 매우 밀접해 보여, 원활한 이해를 위해 Approach Overview를 읽고 바로 Appendix를 읽었음.
    • 그러나 Appendix는 핵심 기술의 component 레벨 요소에 관한 세부 설명이었음
    • 즉 논문 본문을 다 읽고 Appendix를 읽는 것이 효율적이었음


1. Introduction

1.1. Abstract

image.png

  • IoT devices - Software on these systems(Firmware)
    • Analysis challenged
      • Hardware-dependent
      • Not-standardized Environment → 1) unique & minimal 2) non-standard configurations
    • Analysis complicated
      • Multiple Binaries
        • Addressing the communication between executables is complicated
  • KARONTE
    • Static
    • Modeling & Tracking multi-binary interactions
    • Starting from taint information → insecure interactions → vulnerability identification
    • Evaluation
      1. On Tracking&Constraining Multi-binary interactions abilities
      2. On Scaling ability about:
        1. firmwares with various size and complexity


2. Background

Emersion of IoT devices

  • Vast amount of IoT devices introduced new-coined threats into cybersecurity landscape
  • Techniques that are invented, especially in the perspective of unpacking binaries, were insufficient


Constraint of traditional analysis techniques

  • Interconnected components(functionality based on multiple programs execution)

    ↔ analysis without accounting for the internal flow of data

    → Ignore meaningful constraints arise from inter-binary communication

    → Inability to differentiate sources of input(attacker-controlled or non-attacker-controlled)

    → Limited search performance that leads to uncovering only superficial bugs

Hence,

  • Consideration about multiple binary execution is necessary
  • Analysis on the data shared among multiple binary is necessary


KARONTE

  • Static
  • Track data flow
  • Intuition:
    1. IPC paradigm used in one firmware is likely to be finite
    2. Derive commonalities in the paradigm set
    3. Use the commonalities to detect input locations & inter-component interactions
    4. Use the verified interactions to track inter-component data flow
    5. Perform cross-binary taint analysis
    6. Detect insecure uses(potential vulnerabilities)


3. Approach Overview

여기 설명된 항목 정리하고 Appendix에 나온 거랑 매핑해서 정리하자


3.1. Firmware Pre-processing

  • unpack firmware sample using ‘binwalk’


3.2. Border Binaries Discovery

  • ‘Border Binary’: Binaries that export the device functionality to the outside world
    • Represents the point where accepts user requests & references user-controlled data


3.3. Binary Dependency Graph(BDG) Recovery

  • ‘BDG’
    • Directed graph
    • Models communication among border binaries by leveraging CPF(Communication Paradigm Finder) modules


3.4. Multi-binary Data-flow Analysis

  • By using ‘static taint engine’, the KARONTE:
    • Tracks data propagation
    • Collects data constraints
  • Simulate data propagation applying features gathered
    • from the target binary b to other binaries that have inbound edges from b


3.5. Insecure Interactions Detection

  • Identifying security issues


4. Border Binaries Discovery


5. Binary Dependency Graph(사실상 CPF가 핵심)


5.1. Purpose and Overview

  • BDG: detects data dependencies & model data propagation (setter binary → getter binary)
  • Challenges:
    • Inter-binary data propagation:
      • control flow information is useless because:
        • processes do NOT normally access other processes’ memory regions.
  • Solution:
    1. Model IPC paradigms by using CPFes(Communication Paradigm Finders)
    2. Use modeled paradigms to build a graph == BDG(Binary Dependency Graph)


5.2. Communication Paradigm Finders

Objective

  • Detect & Describe specific IPC paradigm which binary uses to share data
  • 1) explore binary & program path → 2) Does the path contains the necessary code to share data through the communication paradigm? → 3) If so, conduct deeper analysis using techniques below → 4) create edges of BDG utilizing features distinguished
    1. Data Key Recovery
    2. Flow Direction Determination
    3. Binary Set Magnification


Key Functionalities

  1. Data Key Recovery
    1. referenced by the binary for IPCs
    2. fundamental & essential → THE MOST IMPORTANT TECHNIQUE
  2. Flow Direction Determination(scope: one binary)
    1. Role of each program points: ‘Setter’ & ‘Getter’
    2. program points: access the data keys
  3. Binary Set Magnification(scope: whole firmware)
    1. if any ‘binaries’ refer to the data keys → scheduled for further analysis


Implementation

  • ‘Semantic CPF’: OS-Independent
    • Intuition: ‘Data Keys’ *IPC must rely on them *Often hard-coded in binaries
  • KARONTE uses:
    • Environment CPFes
    • File CPFes
    • Socket CPFes
    • Semantic CPFes


5.3. Building the BDG

(시간이 부족한 관계로 우선순위 하강)


6. Static Taint Analysis

The operation of the underlying taint engine


7. Multi-Binary Data-Flow Analysis

How KARONTE combines the taint engine with the BDG to do detection

BDG

BFG


7.1. Key Insight & Concepts

  • Paths with fewer constraints on user data dd are more likely to expose vulnerabilities.
  • BFG: Extended version of the BDG in the direction of ‘least strict set of constraint applied to the data shared among multiple binaries’


7.2. BFG Building

  1. Initialization
  2. Constraint Propagation


8. Insecure Interations Detection

Detection Target:

  1. Memory-Corruption bugs
  2. DoS vulnerabilities


9. Discussion


10. Evaluation


12. Conclusion

KARONTE: Detect insecure interactions among components of embedded firmware

+) Emphasizing the effectiveness of KARONTE

→ 어떻게 insecure interactions를 알아냈는지에 관해 설명할 수 있는 게 중요한 듯해, Evaluation 파트는 나중에 읽어보는 것으로 함




Appendix

Background를 읽으면서 이걸 같이 봐야 할 것 같음


A. Functions Identification

3 types of functions is the goal of distinguishment


  1. memcmp-like functions

    👥: sementically equvalent to memory comparisons

    1. methodology
      1. If the target function $f$ contains at least one loop, then:
        1. Scan the instructions in the body of the loop in the linear manner and list every program point $p$ which contains memory comparison instructions
        2. Compute a static backward slice $p$ → $f$’s entry point
        3. Inspect $f$’s args to clarify whether they could affect operands in $p$ and if so, then:
          1. $f$ becomes a candidate of memcmp-like functions
          2. Calculate the size of $f$(based on number of its basic blocks)
          3. Adopt BootStomp’s threshold to decrease the number of false positives
  2. strlen-like functions

    👥: calculate the length of a buffer

    1. methodology
      1. similar to memcmp-like function search implementation + ‘the existence of counter, which increases as the number of loop iteration goes’
  3. memcpy-like functions

    👥: copy the content of a memory location to another

    1. methodology
      1. same with the methodology of Bootstomp
  4. function body is not available
    1. Heuristically match strings on the name of the function

    (opinion) It may become an armpit of this system

  5. Optimization strategy
    1. Harness function generation: abstract the functions described above to minimize the resource of repeated execution

      → mitigate the ‘path explosion problem’

      → speed up, without losing precision.


B. Border Binaries Discovery

Connection mark → Flag

Network mark → Counter

Calculation Hardness: Connection mark »»»» Network mark

  1. Network mark calculation
    1. Retrieve all the memory comparisons within a binary == Assume these will refer to hard-coded network related strings
    2. The strings mostly referred within the basic block(the call to the memory comparison, too)
  2. Connenction mark calculation
    1. Forward static taint analysis + Backward static taint analysis
    2. Limitation on the {Number of functions analyzed, Time of symbolic exploration}
    3. For the case of path(source-sink) exploration failed:
      1. any imprecision from a function $f$ analysis⇒ analysis for $f$ is incomplete
      2. over 50%(setted threshold) of experiments ends up in incomplete analysis, then:
        1. conservatively set the connection mark.
    4. Regarding OS dependency: if the OS is unknown, then simply set the connection mark.
  3. Utilize the feature ‘cmp’ when calculating Parsing Score.

C. Communication Paradigm Finders

Purpose of CPF: When recovering BDG(Binary Dependency Graph), by leveraging CPF, it becomes able to map inter-binary data flow.


Aspect Environment CPF Semantic CPF
Trigger for Analysis Calls that set/get environment variables or execute binaries. Memory operations using hardcoded data keys as indices or references.
Primary Detection Strings representing environment keys or binary names. Functions setting or getting data based on predefined data keys.
Application Context More suited for analyzing high-level OS interactions. Ideal for low-level memory and firmware analysis.


  1. OS-dependent CPF: Environment CPF
    1. Key Idea
      1. Calling to a function setting (or getting) environment variables is almost necessary when sharing data through environment variables before executing another binary.
        1. Binary Execution Block Searching
        2. Path (entry-block) searching to find out program points calling environment variable setting functions
        3. reach-def analysis on path(entry-point) → arguments values determination
        4. Determined values == ‘data keys’
          • ‘Binary Set Magnification Functionality’ ⇒ reach-def analysis → arg strings collection → possible binary names inference
        5. If binary names are unable to resolve → find all the binaries that rely on the data keys previously recovered.
    2. Detection Focus
      1. Program Path
      2. Binary Level Interaction
  2. OS-independent CPF: Semantic CPF
    1. Key Idea
      1. IPC often relies on predefined data keys, which is often hardcoded as constants.
    2. Detection Focus
      1. Function-Level Data Flow
      2. Memory Operations


D. Binary Dependency Graph Algorithm


E. Static Taint Analysis


F. Multi-binary Data-flow Analysis


G. Vulnerability Example



About TouBVA
TouBVA

A Security Researcher, now concentrating on Security Consulting.

Email : touBVa@gmail.com

Website : https://toubva.github.io