Solved Parsing similarly to CLIs

Discussion in 'Plugin Development' started by Shqep, Aug 9, 2021.

Thread Status:
Not open for further replies.
  1. Offline

    Shqep

    This will probably be a long post. I put spoilers so your screen wouldn't be flooded with my text wall.

    The problem (open)

    I wanted to make a parser that takes in the list of arguments Bukkit's CommandExecutor provides, and parses it into properties and arguments without any initial data.
    For example, a command is inputted:
    /report blah blah blah --player Notch --reason "He stole my diamond hoe" --random-flag -f

    I want the parser to return a Pair, just a generic tuple that hosts 2 values, first and second. First being a list of arguments, second being a properties map.

    The arguments list contains all arguments that are not a property and not bound to a property, that means it does not precede with a dash and does not follow an argument that precedes with a dash.
    With the command above, this list would be: [blah, blah, blah]

    Next, the properties map contains all other arguments, that are properties or bound to one. A property is an argument that precedes with a dash, and the argument is considered bound when it follows a property and it itself is NOT a property.
    With the command above, the map would be:
    {"--player": "Notch", "--reason": "He stole my diamond hoe", "--random-flag": null, "-f": null}


    My idea (open)

    I thought of that you could join the entire arguments list into a String, then go through it char by char. But it gets overwhelming quickly and is prone to bugs and oversights.

    So I thought of making a regular expression that only takes whatever is in quotes, so given a String
    hello "test \" 2" test "world "
    It would match against `"test \" 2"` and `"world "`. The regex would also take in consideration escaped quotes. Then I save the data of the text to a String, and replace the entire text in quotes to that unique String, so the text is now 1 argument, which is way easier to parse.

    The problem with this approach is that it's not infallible. Anyone could just put a random UUID, and there's still an abysmal chance that it parses wrongly.

    For example, lmao <uuid-here> -g <data>
    When data is parsed and assigned a UUID, and Java decides to generate a random UUID that exactly matches the provided uuid-here, so afterwards you get lmao <data> -g <data> after the replacing.


    Here's the current code, in Kotlin. Sorry I couldn't convert this well to Java.
    I have commented the most important pieces, so it should be comprehensible enough.

    In any way, I still think that going through the String letter by letter is still the best way. Any ideas?

    EDIT: Fixed the link because it linked to an uncommented piece.
     
  2. Offline

    KarimAKL

    Do not worry about that too much. Anyone that is able to help with providing the best approach to the problem should be able to understand it regardless.

    Anyway, I am no expert, so someone might have a better solution, however, I would probably go about it the same way as you; iterating the chars in the string.
    Not much can be done about this, but you can split sections of the code into different methods and comment on the code to reduce the overwhelmingness.
    It happens to everyone. Make quite a few bug tests before release (if you plan on releasing it) and then fix any upcoming bugs or oversights.
     
    Shqep likes this.
  3. Offline

    Shqep

    Karim gave me motivation to try another approach to the problem. Even though the time complexity of the algorithm is probably not too good, but at least it's not quadratic yet.
    Probably not a very good practice, but I personally write algorithms within Python first, since it's a very straight forward and clean language, then I convert it to Java/Kotlin.

    Here's something I whipped up with hopes that it can run near linear time: cli.py
    Any improvement suggestion is appreciated.
     
    Last edited: Aug 10, 2021
  4. Offline

    davidclue

    @Shqep I'm not sure if I'm understanding the problem but why can't you just cycle through the args list and for each element check if it starts with a "-" and then grab the next element and check if it starts with a " and if not then pair them together, if it does then cycle through the next elements putting them in a string and just check with .contains(''") and if so end the string and pair it with the property. For the arguments list if an element does not start with "-" add it to the arguments list.
     
  5. Offline

    Shqep

    It is because I tried this in the first version, and there are cases when there are multiple quotes inside an argument, or escaped quotes and the parser gets it wrong.
    Such as when you input (hello "sadwa "sdadw). If you want to check for these cases, you still have to iterate through each char of the argument, and may miss some spaces. But when you iterate char by char in the entire String, you can separate the arguments list from [hello, "sadwa ,"sdadw] to [hello, sadwa ,sdadw].
    It's quite hard to explain or I'm bad at explaining.
     
  6. Offline

    davidclue

    @Shqep You don't need to iterate through it char by char, you can use string.startsWith('"') and then make a loop that will cycle through the next elements and add it to a separate string for the sentence and for every element use string.contains('"') and if it does contain a quotation mark just break the loop after adding the final element and return the new string.
     
  7. Offline

    Shqep

    @davidclue
    .contains internally loops char by char, and check if it equals to the parameter you passed in. And if there are multiple quotes in the argument, .contains will always return true. If so, how do you know where to split at?
    And if you say use .contains, then use .split, then append, that's doing 3 loops: 1st time to find whether it contains the mark, 2nd time to split the String into a list, then 3rd to append. I kind of don't get why it would be more efficient than looping once, char by char?

    Or that you don't get my idea, and I don't get yours. I don't know. It's weird.
     
  8. Offline

    davidclue

    @Shqep
    Okay so then use endsWith('"') to check if quotes finish it, I don't see why you would waste so much time trying to let them input quotes internally when they can just use ' or ` as a substitute. This seems like the most efficient way to do it and you won't be going char by char.
     
  9. Offline

    Shqep

    @davidclue
    I appreciate that you are actively trying to help me get a more efficient solution. And for that, thank you.

    But if you check with endsWith, you can't know if there are multiple quotes inside the String. So if a String is passed in like this (in backticks) `"zhongli is so h"ot"`. Using endsWith on the last argument would yields true, so would you parse that String as a list: [zhongli is so h"ot] or [zhongli is so h, ot].
     
  10. Offline

    davidclue

    @Shqep But why would you accept internal quotations when they could just use ' or ` as a substitute. If you want it done more efficiently the best way is simply to just not accept that kind've input.
     
  11. Offline

    Shqep

    I kind of don't get why accepting other as substitutes would solve the parser's problem. I think we're going off topic here?
     
  12. Offline

    davidclue

    @Shqep If they want to make property have internal quotations they could just use a substitute instead, you don't need to accept commands that look like this "zhongli is so h"to" if you do that then you won't have to go char by char but if you really want input like that to be accepted then there is no other way to do it.
     
    Shqep likes this.
  13. Offline

    Shqep

    I'm guessing there are no better and easier way to work out the command line as desired. I whipped up a parser on my own in Java as so:
    Java (open)

    PHP:
    /**
    * Parse a list of command arguments similarly to a parser that operates
    * within a command line interface.
    * For example, the create command for Multiverse has the following syntax:<br>
    * <code>
    *     create {NAME} {ENV} [-s SEED] [-g GENERATOR[:ID]] [-t TYPE] [-a true|false]
    * </code>
    * <hr>
    * With a list of arguments passed in like so:<br>
    * <code>
    *     create "test world" normal -s 123123123 -g "generator" -t "flat" -a true
    * </code>
    * <hr>
    * This will return a pair, with the first being arguments and second being a properties map:<br>
    * <code>
    *     arguments = [create, test world, normal]<br>
    *     properties = {
    *         -s: 123123123,
    *         -g: generator,
    *         -t: flat,
    *         -a: true
    *     }
    * </code>
    * @param list The arguments to parse from.
    * @return The aforementioned pair.
    * @return
    */
    public static Pair<List<String>, Map<String, List<String>>> parseAsCLI(final List<String> list) {
        final 
    String args String.join(" ", list);

         
    boolean isRecording false;
        final List<
    Stringarguments = new ArrayList<>();
        final 
    StringBuilder current = new StringBuilder();
        
    int pointer 0;

         
    // First loop to take care of all the arguments and quotations.
        
    while(pointer args.length()) {
            if(
    isRecording) {
                if(
    args.charAt(pointer) == '\\' && pointer args.length() && args.charAt(pointer 1) == '"') {
                    
    current.append("\"");
                    
    pointer += 2;
                } else if(
    args.charAt(pointer) == '"') {
                    
    isRecording false;
                    
    arguments.add(current.toString());
                    
    current.delete(0current.length());
                    
    pointer++;
                } else {
                    
    current.append(args.charAt(pointer));
                    
    pointer++;
                }
            } else {
                if(
    args.charAt(pointer) == '"'isRecording true;
                else if(
    args.charAt(pointer) == ' ') {
                    if(
    current.length() > 0) {
                        
    arguments.add(current.toString());
                        
    current.delete(0current.length());
                    }
                } else 
    current.append(args.charAt(pointer));
                
    pointer++;
            }
        }

        if(
    current.length() > 0arguments.add(current.toString());
         final 
    Map<String, List<String>> properties = new HashMap<>();
        final List<
    Stringleftovers = new ArrayList<>();
        
    pointer 0;

         
    // Second loop to sort properties and leftover arguments.
        // Can this be optimized to do this all in 1 loop? Probably.
        
    while(pointer arguments.size()) {
            if(
    arguments.get(pointer).startsWith("-")) {
                final 
    String value;
                final 
    int add;
                 if(
    pointer arguments.size() && !arguments.get(pointer 1).startsWith("-")) {
                    
    value arguments.get(pointer 1);
                    
    add 2;
                } else {
                    
    value null;
                    
    add 1;
                }
                 final 
    String prop arguments.get(pointer).replaceAll("^-+""");
                
    properties.putIfAbsent(prop, new ArrayList<>());
                if(
    value != null || !properties.get(prop).contains(null)) properties.get(prop).add(value);
                
    pointer += add;
            } else {
                
    leftovers.add(arguments.get(pointer));
                
    pointer++;
            }
        }

         return new 
    Pair<>(leftoversproperties);
    }


    Pair class (open)

    PHP:
    static class Pair<AB> {
         private final 
    A first;
         private final 
    B second;
         public 
    Pair(final A first, final B second) {
            
    this.first first;
            
    this.second second;
         }
         public 
    A getFirst() {
            return 
    first;
         }
         public 
    B getSecond() {
            return 
    second;
         }
    }
     
    davidclue likes this.
Thread Status:
Not open for further replies.

Share This Page