- 0. 주안점
- 1. Introduction
- 2. Background
- 3. Approach Overview
- 4. Border Binaries Discovery
- 5. Binary Dependency Graph(사실상 CPF가 핵심)
- 6. Static Taint Analysis
- 7. Multi-Binary Data-Flow Analysis
- 8. Insecure Interations Detection
- 9. Discussion
- 10. Evaluation
- 12. Conclusion
- Appendix
0. 주안점
논문의 요지 → 1) 어떤 새로운 기술을 도입해 2) 이만큼의 향상된 결과를 냈다
- 즉, 논문의 핵심은 → ‘향상된 결과’를 낼 수 있었던 원인이 ‘어떤 기술’ 인가
- 추가로, ‘이만큼 향상됐음’의 근거는 무엇인가
이번 논문 과제의 의도는 연구실이 기반하고 있는 연구 분야가 무엇인지 이해하는 것이라고 생각
따라서 ‘기술’에 주안점을 두고 논문 리딩 진행
💡논문에서 내가 얻어야 할 것은?
→ 향상된 결과의 기반이 되었던 기술적 발전은 무엇인가?
- multiple binaries analysis + inter-binary data flow analysis
→ 그것은 어떤 아이디어에 기반하였는가?
- Modeling &Tracking multi-binary interactions
- IPC paradigm used in one firmware is likely to be finite(concept)
- 한 바이너리의 data I/O 파악(mid-implementation) → inter-binary data flow recovery
- by leveraging CPF(Communication Paradigm Finder) modules(Implementation)
시행착오
- Appendix가 기술set 설명과 매우 밀접해 보여, 원활한 이해를 위해 Approach Overview를 읽고 바로 Appendix를 읽었음.
- 그러나 Appendix는 핵심 기술의 component 레벨 요소에 관한 세부 설명이었음
- 즉 논문 본문을 다 읽고 Appendix를 읽는 것이 효율적이었음
1. Introduction
1.1. Abstract
- IoT devices - Software on these systems(Firmware)
- Analysis challenged
- Hardware-dependent
- Not-standardized Environment → 1) unique & minimal 2) non-standard configurations
- Analysis complicated
- Multiple Binaries
- Addressing the communication between executables is complicated
- Multiple Binaries
- Analysis challenged
- KARONTE
- Static
- Modeling & Tracking multi-binary interactions
- Starting from taint information → insecure interactions → vulnerability identification
- Evaluation
- On Tracking&Constraining Multi-binary interactions abilities
- On Scaling ability about:
- firmwares with various size and complexity
2. Background
Emersion of IoT devices
- Vast amount of IoT devices introduced new-coined threats into cybersecurity landscape
- Techniques that are invented, especially in the perspective of unpacking binaries, were insufficient
Constraint of traditional analysis techniques
-
Interconnected components(functionality based on multiple programs execution)
↔ analysis without accounting for the internal flow of data
→ Ignore meaningful constraints arise from inter-binary communication
→ Inability to differentiate sources of input(attacker-controlled or non-attacker-controlled)
→ Limited search performance that leads to uncovering only superficial bugs
Hence,
- Consideration about multiple binary execution is necessary
- Analysis on the data shared among multiple binary is necessary
KARONTE
- Static
- Track data flow
- Intuition:
- IPC paradigm used in one firmware is likely to be finite
- Derive commonalities in the paradigm set
- Use the commonalities to detect input locations & inter-component interactions
- Use the verified interactions to track inter-component data flow
- Perform cross-binary taint analysis
- Detect insecure uses(potential vulnerabilities)
3. Approach Overview
여기 설명된 항목 정리하고 Appendix에 나온 거랑 매핑해서 정리하자
3.1. Firmware Pre-processing
- unpack firmware sample using ‘binwalk’
3.2. Border Binaries Discovery
- ‘Border Binary’: Binaries that export the device functionality to the outside world
- Represents the point where accepts user requests & references user-controlled data
3.3. Binary Dependency Graph(BDG) Recovery
- ‘BDG’
- Directed graph
- Models communication among border binaries by leveraging CPF(Communication Paradigm Finder) modules
3.4. Multi-binary Data-flow Analysis
- By using ‘static taint engine’, the KARONTE:
- Tracks data propagation
- Collects data constraints
- Simulate data propagation applying features gathered
- from the target binary b to other binaries that have inbound edges from b
3.5. Insecure Interactions Detection
- Identifying security issues
4. Border Binaries Discovery
5. Binary Dependency Graph(사실상 CPF가 핵심)
5.1. Purpose and Overview
- BDG: detects data dependencies & model data propagation (setter binary → getter binary)
- Challenges:
- Inter-binary data propagation:
- control flow information is useless because:
- processes do NOT normally access other processes’ memory regions.
- control flow information is useless because:
- Inter-binary data propagation:
- Solution:
- Model IPC paradigms by using CPFes(Communication Paradigm Finders)
- Use modeled paradigms to build a graph == BDG(Binary Dependency Graph)
5.2. Communication Paradigm Finders
Objective
- Detect & Describe specific IPC paradigm which binary uses to share data
- 1) explore binary & program path → 2) Does the path contains the necessary code to share data through the communication paradigm? → 3) If so, conduct deeper analysis using techniques below → 4) create edges of BDG utilizing features distinguished
- Data Key Recovery
- Flow Direction Determination
- Binary Set Magnification
Key Functionalities
- Data Key Recovery
- referenced by the binary for IPCs
- fundamental & essential → THE MOST IMPORTANT TECHNIQUE
- Flow Direction Determination(scope: one binary)
- Role of each program points: ‘Setter’ & ‘Getter’
- program points: access the data keys
- Binary Set Magnification(scope: whole firmware)
- if any ‘binaries’ refer to the data keys → scheduled for further analysis
Implementation
- ‘Semantic CPF’: OS-Independent
- Intuition: ‘Data Keys’ *IPC must rely on them *Often hard-coded in binaries
- KARONTE uses:
- Environment CPFes
- File CPFes
- Socket CPFes
- Semantic CPFes
5.3. Building the BDG
(시간이 부족한 관계로 우선순위 하강)
6. Static Taint Analysis
The operation of the underlying taint engine
7. Multi-Binary Data-Flow Analysis
How KARONTE combines the taint engine with the BDG to do detection
BDG
BFG
7.1. Key Insight & Concepts
- Paths with fewer constraints on user data dd are more likely to expose vulnerabilities.
- BFG: Extended version of the BDG in the direction of ‘least strict set of constraint applied to the data shared among multiple binaries’
7.2. BFG Building
- Initialization
- Constraint Propagation
8. Insecure Interations Detection
Detection Target:
- Memory-Corruption bugs
- DoS vulnerabilities
9. Discussion
10. Evaluation
12. Conclusion
KARONTE: Detect insecure interactions among components of embedded firmware
+) Emphasizing the effectiveness of KARONTE
→ 어떻게 insecure interactions를 알아냈는지에 관해 설명할 수 있는 게 중요한 듯해, Evaluation 파트는 나중에 읽어보는 것으로 함
Appendix
Background를 읽으면서 이걸 같이 봐야 할 것 같음
A. Functions Identification
3 types of functions is the goal of distinguishment
-
memcmp-like functions
👥: sementically equvalent to memory comparisons
- methodology
- If the target function $f$ contains at least one loop, then:
- Scan the instructions in the body of the loop in the linear manner and list every program point $p$ which contains memory comparison instructions
- Compute a static backward slice $p$ → $f$’s entry point
- Inspect $f$’s args to clarify whether they could affect operands in $p$ and if so, then:
- $f$ becomes a candidate of memcmp-like functions
- Calculate the size of $f$(based on number of its basic blocks)
- Adopt
BootStomp
’s threshold to decrease the number of false positives
- If the target function $f$ contains at least one loop, then:
- methodology
-
strlen-like functions
👥: calculate the length of a buffer
- methodology
- similar to memcmp-like function search implementation + ‘the existence of counter, which increases as the number of loop iteration goes’
- methodology
-
memcpy-like functions
👥: copy the content of a memory location to another
- methodology
- same with the methodology of
Bootstomp
- same with the methodology of
- methodology
- function body is not available
- Heuristically match strings on the name of the function
(opinion) It may become an armpit of this system
- Optimization strategy
-
Harness function generation: abstract the functions described above to minimize the resource of repeated execution
→ mitigate the ‘path explosion problem’
→ speed up, without losing precision.
-
B. Border Binaries Discovery
Connection mark → Flag
Network mark → Counter
Calculation Hardness: Connection mark »»»» Network mark
- Network mark calculation
- Retrieve all the memory comparisons within a binary == Assume these will refer to hard-coded network related strings
- The strings mostly referred within the basic block(the call to the memory comparison, too)
- Connenction mark calculation
- Forward static taint analysis + Backward static taint analysis
- Limitation on the {Number of functions analyzed, Time of symbolic exploration}
- For the case of path(source-sink) exploration failed:
- any imprecision from a function $f$ analysis⇒ analysis for $f$ is incomplete
- over 50%(setted threshold) of experiments ends up in incomplete analysis, then:
- conservatively set the connection mark.
- Regarding OS dependency: if the OS is unknown, then simply set the connection mark.
- Utilize the feature ‘cmp’ when calculating Parsing Score.
C. Communication Paradigm Finders
Purpose of CPF: When recovering BDG(Binary Dependency Graph), by leveraging CPF, it becomes able to map inter-binary data flow.
Aspect | Environment CPF | Semantic CPF |
---|---|---|
Trigger for Analysis | Calls that set/get environment variables or execute binaries. | Memory operations using hardcoded data keys as indices or references. |
Primary Detection | Strings representing environment keys or binary names. | Functions setting or getting data based on predefined data keys. |
Application Context | More suited for analyzing high-level OS interactions. | Ideal for low-level memory and firmware analysis. |
- OS-dependent CPF: Environment CPF
- Key Idea
- Calling to a function setting (or getting) environment variables is almost necessary when sharing data through environment variables before executing another binary.
- Binary Execution Block Searching
- Path (entry-block) searching to find out program points calling environment variable setting functions
- reach-def analysis on path(entry-point) → arguments values determination
- Determined values == ‘data keys’
-
- ‘Binary Set Magnification Functionality’ ⇒ reach-def analysis → arg strings collection → possible binary names inference
- If binary names are unable to resolve → find all the binaries that rely on the data keys previously recovered.
- Calling to a function setting (or getting) environment variables is almost necessary when sharing data through environment variables before executing another binary.
- Detection Focus
- Program Path
- Binary Level Interaction
- Key Idea
- OS-independent CPF: Semantic CPF
- Key Idea
- IPC often relies on predefined data keys, which is often hardcoded as constants.
- Detection Focus
- Function-Level Data Flow
- Memory Operations
- Key Idea