Is `krename`'s regex broken, or am I bad at it?

rokejulianlockhart · February 21, 2023, 11:16pm

I’m trying to replace the entirety of the provided string with the regex in the replacement field.

Attempt #1

Attempt #2

I’ve tried many more attempts, as undermentioned.

Is this working correctly (my regex might be wrong) or is krename’s regex borked? I’ve even asked ChatGPT, to little avail:

The conversation

(I hope that this helps.)

@rokejulianlockhart

What’s the correct regex to match everything? I’m trying to match everything to replace that with a certain string using krename, but if I use . as the regex, each character is replaced by the string, which causes undesirable duplication. (Do I perhaps need a capture group?)

To match everything, you can use the regex pattern .*, which matches zero or more occurrences of any character, including whitespace and special characters.

However, if you want to replace all occurrences of a certain string with another string, you should use the regex pattern that matches the specific string you want to replace. For example, if you want to replace all occurrences of the string “foo” with the string “bar”, you would use the regex pattern foo and replace it with bar.

To avoid undesirable duplication, you can use a capture group to replace the matched string with itself, wrapped by a custom string. For example, if you want to replace all occurrences of the string “foo” with the string “barfoo”, you can use the regex pattern (foo) and replace it with bar\1, where \1 represents the first capture group.

In krename, you can use the following syntax to replace all occurrences of a string using a capture group:

Search pattern: foo
Replace pattern: bar\0

The \0 represents the entire match and will be replaced with the capture group.

@rokejulianlockhart

\0 doesn’t appear to work, and .* still creates 4 of the string rather than 1, although it’s definitely an improvement since just . creates about 17 instances of the string instead. (.*?) doesn’t work at all.

ChatGPT

I apologize for the confusion in my previous response. In krename, you can use \0 to represent the entire match, not the capture group.

To replace all occurrences of a string using a capture group in krename, you can use the following syntax:

Search pattern: (foo)
Replace pattern: bar\1

The \1 represents the first capture group.

Regarding your issue with .* creating multiple instances of the string, this may be because .* matches any character, including whitespace and special characters, and can match multiple times. To match the entire string, you can use ^.*$ as the search pattern, which matches any character from the beginning (^) to the end ($) of the string.

Alternatively, if you only want to match a specific portion of the string, you can modify the .* pattern to match only the desired characters. For example, if you want to match any sequence of letters, numbers, and underscores, you can use the pattern \w+, which matches one or more word characters.

I hope this helps!

@rokejulianlockhart

^.*$ is even better, but it still matches twice instead of once. Once is what I’m hoping to achieve. Any idea of what might be better?

ChatGPT

I see, if the regex pattern ^.*$ is still matching multiple times, it’s possible that the text you’re trying to match contains newline characters, which are not matched by the . character.

To match everything including newline characters, you can use the [\s\S]* pattern, which matches any whitespace or non-whitespace character zero or more times. This pattern should match the entire text in one go, even if it contains newline characters.

So for your use case, you could try using [\s\S]* as the search pattern to match everything, and then replace it with the desired string. For example, if you want to replace all occurrences of the string “foo” with the string “bar”, you could use the following:

Search pattern: [\s\S]*
Replace pattern: bar

This will replace the entire matched text with the string “bar”, effectively replacing everything.

I hope this solution works for you!

@rokejulianlockhart

Unfortunately, whereas ^.*$ matches twice, [\s\S]* matches 4 times instead… Any more ideas?

I’m sorry to hear that [\s\S]* is still matching multiple times. Another possibility is that there might be some whitespace characters at the beginning or end of the string that are causing multiple matches.

You could try using the \A and \z anchors instead of ^ and $. The \A anchor matches the beginning of the string, and the \z anchor matches the end of the string, even if there are newline characters present.

So for your use case, you could try using the \A and \z anchors with the [\s\S]* pattern to match the entire string in one go, like this:

Search pattern: \A[\s\S]*\z
Replace pattern: bar

This should match the entire string in one go, without matching any extra characters.

I hope this solution works for you!

@rokejulianlockhart

\A[\s\S]*\z doesn’t match whatsoever. (Perhaps krename is at fault.)

ChatGPT

I see. It’s possible that krename doesn’t support the \A and \z anchors, or that there’s some other issue with the way it’s handling regular expressions.

In that case, you might try using the [\s\S]* pattern with a non-greedy qualifier, such as ?, which will match the smallest possible string instead of the largest possible one.

So for your use case, you could try using the following pattern:

Search pattern: [\s\S]*?
Replace pattern: bar

This should match the entire string in one go, without matching any extra characters.

I hope this solution works for you!

@rokejulianlockhart

[\s\S]*? doesn’t match either.

ChatGPT

I apologize for the confusion. Another possible solution is to use the [^] character class, which matches any character including newline characters.

So for your use case, you could try using the following pattern:

Search pattern: [^]*
Replace pattern: bar

This should match the entire string in one go, without matching any extra characters.

I hope this solution works for you!

@rokejulianlockhart

Doesn’t match anything either.

ChatGPT

I’m sorry to hear that none of the previous solutions have worked for you. It’s possible that there may be some issue with the way krename is interpreting the regular expressions.

As a workaround, you could try using a two-step process to replace the desired string.

First, you could use a simple pattern like foo to match the string you want to replace. Then, you could use the & symbol in the replacement pattern to represent the entire matched text. Finally, you could replace the entire matched text with the desired string using a text editor that supports regular expressions, like sed or awk.

For example, if you want to replace all occurrences of the string “foo” with the string “bar”, you could use the following steps:

In krename, use the following pattern to match all occurrences of “foo”:

Search pattern: foo
Replace pattern: &This will create a copy of the matched text in the replacement string.

Save the renamed files and use a text editor with regular expression support to replace all instances of foo with bar in the file names.

If you have a large number of files to rename, you could use a command-line tool like sed or awk to perform the second step in a batch process.

I hope this workaround helps you achieve your desired result!

@rokejulianlockhart

When I mentioned that ^.*$ matched it twice, you said that that might be because the string contains newlines. It doesn’t. The original filename is (1675875785).bin.txt.log. I want to replace it with ([year][month][day]T[hour][minute][second].#{0;1}) to get (20230221T230253.0).log. I’ve instructed krename to manually protect the last file extension, so the regex matching everything shouldn’t be a problem since krename should append .log (the file extension) at the end.

ChatGPT

Thank you for the additional information. Given that the file name you provided does not contain any newline characters, it’s strange that some of the regular expressions we’ve tried so far aren’t working as expected.

If you’re still having trouble, you could try using a slightly different pattern, such as [\s\S]* or [^]* with a reluctant quantifier *? to ensure that only one match is returned.

Here’s an example search pattern that should match the entire file name, including the extension:

[\s\S]*?\.[^.]*$

This pattern matches any character (including newlines) any number of times, as few times as possible, up to the last period (.) in the string. It then matches any non-period character ([^.]*) up to the end of the string ($), which should be the file extension.

In the replace pattern, you can use capture groups to insert the parts of the new file name you want. For example, you can use the following replace pattern to replace the file name with the desired format:

(\d{4})(\d{2})(\d{2})T(\d{2})(\d{2})(\d{2}).#{0;1}.log

This pattern captures the year, month, day, hour, minute, and second from the original file name using groups (\d{4}), (\d{2}), etc. It then inserts those captured groups into the desired format for the new file name. The #{0;1} in the replace pattern is a krename-specific syntax for generating a unique number (or not), so you can leave that as-is.

I hope this helps!

@rokejulianlockhart

Unfortunately, although that looks like brilliantly specialized regex, it doesn’t work either. Perhaps a version of ^.*$ which doesn’t match newlines might work, since that might be what’s duplicating it?

ChatGPT

I apologize for the continued issues you’re having with this. Here’s another pattern you could try that matches any character except newline, and uses a reluctant quantifier to ensure only one match is returned:

[^\n]*?

You can use this pattern in the search field of krename to match the entire file name, and then use capture groups in the replace field to create the desired format for the new file name. For example, you could use the following replace pattern to create a file name in the format you specified earlier:

(\d{4})(\d{2})(\d{2})T(\d{2})(\d{2})(\d{2}).#{0;1}.log

This pattern captures the year, month, day, hour, minute, and second from the original file name using groups (\d{4}), (\d{2}), etc., and inserts those captured groups into the desired format for the new file name. The #{0;1} in the replace pattern is a krename-specific syntax for generating a unique number (or not).

I hope this pattern works for you!

rokejulianlockhart · February 21, 2023, 11:22pm

Also, how strange that some screenshots contain a wallpaper whereas others don’t. Perhaps that’s a bug of Spectacle’s (or perhaps I’m also somehow using that incorrectly, XD).

phunh · February 26, 2023, 10:59am

I don’t use krename much, but after playing around with it a bit, these are what I’ve found. Also, the conversation with ChatGPT doesn’t help much, other than providing expressions you used for easier testing (text can’t be copied from the picture), so I’d suggest you remove all of it and just keep those texts that you entered into krename during your attempts. ChatGPT is great, sure, but it can’t know everything in existence.

So the first thing to tell about krename in your attempts is that “find and replace” is applied after you actually rename parts of the file name using the template. “The template” is everything in the “Filename” box on the “Simple Filename” tab, and on the “Advanced Filename” tab, it is right at the top. Those are places that you should put your expressions/krename “functions” into, then when you check the “Use extension of the input file” box, you would see the old extension actually be kept. “Find and replace” is applied after that, and to both file name and extension parts, it doesn’t keep the old extension.

For your first attempt, the regex only matches the first character of the string, so only the first characters of file name and extension are replaced.

In your second attempt, the regex matches the whole string, so both file name and extension are replaced.

rokejulianlockhart · February 27, 2023, 6:42pm

The reason that I’m trying to do it this way is that I sometimes have multiple operations to perform upon the entirety of the filename, and the only way to schedule such operations chronologically. Importing scheduled operations also necessitates the XML file that the find-and-replace window creates:

([year][month][day]T[hour][minute][second].#{0;1}+0000)

phunh · February 27, 2023, 9:53pm

I don’t know what your “operations” are, what I’m trying to help you with here is to use krename to rename multiple files in the format (1675875785).bin.txt.log to ([year][month][day]T[hour][minute][second].#{0;1}).log. Let’s make it straightforward, okay?

Select all files that you want to rename, open krename so that those files are listed in krename
If you still have your “find and replace” setting, remove it
In the “Advanced Filename” tab, put ([year][month][day]T[hour][minute][second].#{0;1}) into the text box next to “Template:”; leave the text box on the right alone
Check the list of file names, are they what you want?

rokejulianlockhart · March 1, 2023, 3:46pm

Yeah, they are. I thank you, for I am familiar with that field, and that field is what I used to verify that the provided krename-specific regex operates correctly.

However, you’ve misinterpreted the reason that I’ve asked this question. I’m uncertain of how to rephrase Is `krename`'s regex broken, or am I bad at it? - #4 by rokejulianlockhart, but please reread it. I want to perform something similar to

so I need to be able to include ([year][month][day]T[hour][minute][second].#{0;1}+0000) in the XML file that Save Settings generates:

<!DOCTYPE KRename>
<KRename version="1.0"><FindReplaceList rows="2"><Row num="0"><RegularExpression>0</RegularExpression><Find>find 1</Find><ReplaceWith>replace 2</ReplaceWith><ProcessTokens>0</ProcessTokens></Row><Row num="1"><RegularExpression>0</RegularExpression><Find>find 2</Find><ReplaceWith>replace 3</ReplaceWith><ProcessTokens>0</ProcessTokens></Row></FindReplaceList></KRename>

because otherwise, I would need to manually reconstruct this sequence every time I want to perform the operation with krename.

phunh · March 1, 2023, 9:45pm

OK so let’s say that we really have to use find and replace. Then we need our regex to match only the name part and not the extension part. Is it possible for you to use some regex like

^\(\d*\.bin\.txt\)

? Is there any file name that doesn’t match that regex?

rokejulianlockhart · March 1, 2023, 10:43pm

@phunh, it doesn’t appear to evaluate the regex whatsoever. Apologies if this is a fault of mine.

phunh · March 2, 2023, 12:04pm

Sorry, the closing parenthesis should be before “.bin.txt”. So the regex should be

^\(\d*\)\.bin\.txt

rokejulianlockhart · March 3, 2023, 3:53pm

Apologies. Still doesn’t appear to work.

phunh · March 3, 2023, 8:45pm

It works for me

What is selected in “File extension starts at:”? It should be “Last Dot”.

rokejulianlockhart · March 3, 2023, 11:12pm

Yep, that was the problem! Thanks, loads! I wish this instance of Discourse included the Q&A extension, because I’d mark that as the Answer.

Any idea what the regex would be to simply match any string, though? Sometimes I need to rectify even more garbled unique filenames that aren’t just

*.bin.*

phunh · March 5, 2023, 1:01pm

As I said, what you provide in “find and replace” will be applied to both the name part and the extension part. Right now there’s no way that I know of to distinguish between name and extension parts, so the regex that “simply matches any string” for name will also match any string for extension. And that’s why I said “we need our regex to match only the name part and not the extension part”.

I think what you can do is to find something that is true for all file names, and false for all extensions.

rokejulianlockhart · May 25, 2023, 1:52pm

It does now. I’ll mark it, @phunh.