KEMBAR78
Added "-g" option for maximum number of user-specified tokens by antonio-morales · Pull Request #519 · AFLplusplus/AFLplusplus · GitHub
Skip to content

Conversation

@antonio-morales
Copy link
Contributor

Sometimes I've needed to make use of bigger dictionaries (with more than 200 items). Until now, the only option was to modify MAX_DET_EXTRAS in config.h.

I think it would be a good idea to be able to modify this option at runtime.

Sometimes I've needed to make use of bigger dictionaries (with more than 200 items). Until now, the only option was to modify MAX_DET_EXTRAS in config.h.
@domenukk
Copy link
Member

Thanks for the pr.
Two things:
a) did you benchmark it? I would guess probabilistic use of tokes makes sense for a large number of elements
b) maybe make this an env variable instead of a commandline switch? Sth like AFL_MAX_DET_EXRAS

@vanhauser-thc
Copy link
Member

vanhauser-thc commented Aug 21, 2020

I thought about this and I propose something different. the extras dictionary grows to as many entries as which are to be added, no limit. IMHO it doesnt make sense to have a limit.
remember that there is not only -x file, but also -x directory (adding all files in that directory), plus the LTO autodictionary feature.
and checking before how many entries that would be and then change config.h or using -g is cumbersome.
if there are entries that the user wants to have added - they should be added.

@domenukk
Copy link
Member

domenukk commented Aug 21, 2020

As far as I understand (@antonio-morales correct me if I'm wrong), you can add an infinite amount of dict entries, but if you pass this threshold of currently 200, they are sometimes skipped, probabilistically
The important lines are:

      /* Skip extras probabilistically if afl->extras_cnt > MAX_DET_EXTRAS. Also
         skip them if there's no room to insert the payload, if the token
         is redundant, or if its entire span has no bytes set in the effector
         map. */

      if ((afl->extras_cnt > MAX_DET_EXTRAS &&
           rand_below(afl, afl->extras_cnt) >= MAX_DET_EXTRAS) ||
...

This seems like an optimization so other, non-dict things also get a chance.
Making the threshold configurable via env makes sense, but you may actually not change it too much, even if you have more tokens

@antonio-morales
Copy link
Contributor Author

@domenukk @vanhauser-thc Yes, I have used bigger dictionaries a few occasions in the past (modifying MAX_DET_EXTRAS constant and rebuilding the code). I haven't' benchmarked it but, if I'm not mistaken, it's a linear increase in time or O(n).
For example, if we have a file of size 512 bytes:

  • 200 dictionary entries => 512 x 200 entries x 2 (over & insert) = 204800 iterations aprox
  • 400 dictionary entries => 512 x 400 entries x 2 (over & insert) = 409,600 iterations approx

I think that dictionaries with more than 200 entries are useful, for example, if you're fuzzing complex interpreters/compilers.

The variable AFL_MAX_DET_EXRAS looks like a good idea to me. And it allows you to save "-g" flag for a future feature 😄

And I've kept the probabilistic functionality because I think that it can be interesting in certain cases or scenarios.
There are 2 examples:

  • You have created a dictionary with 400 entries, and you want to use all of them. So you can set AFL_MAX_DET_EXRAS=500, and AFL will use all the tokens.
  • You use an automatic tool for creating dictionaries, and this tool has generated a dictionary with 5000 entries. However, you think not worth using all of these entries in all the iterations. So you set AFL_MAX_DET_EXRAS=1000, and AFL will use the entries in a probabilistic way.

@domenukk
Copy link
Member

Can you take a look if 1301552 is a good solution to your problem?

@vanhauser-thc
Copy link
Member

@domenukk this means the user has to make an extra effort in first needing to check how many dictionary entries there are to be loaded. With lto autodict this is additionally cumbersome. We should simply load all supplied. Expect the user to know what he/she is doing.

@domenukk
Copy link
Member

Ah you mean like a AFL_EXTRAS_DET_ALWAYS?

@domenukk
Copy link
Member

Just as a clarification, the important loop here:

for (j = 0; j < afl->extras_cnt; ++j) {

doesn't just skip random entries always, but skips random entries for each fuzz_one. They will get picked up eventually.
This makes a lot of sense to me - if I have more than 200 dict entries, I don't want each fuzz_one to pick all of them up for each mutation, I want afl to pick random ones.
I would argue using this env variable should only be used very occasionally - by people who know what they are doing.
Most of the time this will deteriorate fuzzing.

@domenukk domenukk closed this Aug 23, 2020
@domenukk domenukk reopened this Aug 23, 2020
@vanhauser-thc
Copy link
Member

no I mean currently there is a limit on entries that afl-fuzz reads in - and aborts otherwise.
try afl-fuzz -x dictionaries/ -i in -o out ./test-instr

@domenukk
Copy link
Member

Ah @vanhauser-thc gotcha. How about #523 ?

@vanhauser-thc
Copy link
Member

If #523 is merged is this issue/feature done or is there anything still open?

@antonio-morales
Copy link
Contributor Author

antonio-morales commented Aug 25, 2020

@domenukk @vanhauser-thc I think 1301552 and #523 work well, so this issue can be closed now 👍

@domenukk domenukk closed this Aug 25, 2020
abertschi pushed a commit to mattweingarten/AFLplusplus that referenced this pull request Apr 21, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants