regex: juin 2015

dimanche 28 juin 2015

How do I search/replace all 'n(' -> 'n(' and ')n' -> ')n' within same equation?

Scenario:
The user can enter any number of parentheses pairs into an equation in String format. However I need to check to be sure that all parentheses '(' or ')' have an adjacent multiplier symbol ''. Hence '3(' should be '3(' as ')3' should be ')3'.

I need to replace all occurrences of possible 'n(' with 'n(' and ')n' with ')*n'.

Example: 1+5(3+4)7/2 ---> 1+5*(3+4)7/2

What is the correct regex what to do this?

I was thinking of something like '[0-9](' & )[0-9]'. But I don't know the full syntax of search for all assurances of patterns to be replaced with '' insert.

Making a [code][/code] for BBcode with php regex

I would like to make a [code][/code] tag for bbcode so that what would be inside wouldn't be taken into account by the php regex that I made.

Example :

Hello [b]newbie[/b], to write in bold, use the following : [code][b](YOURTEXT)[/b][/code]

Should return in HTML :

Hello <strong>newbie</strong>, to write in bold, use the following : [b](YOURTEXT)[/b]

Here is a view of a part of my bbcode function :

<?
function bbcode($var) {
   $var = preg_replace('`\[b\](.+)\[/b\]`isU', '<strong>$1</strong>', $var); 
   $var = preg_replace('`\[i\](.+)\[/i\]`isU', '<em>$1</em>', $var);
   $var = preg_replace('`\[u\](.+)\[/u\]`isU', '<u>$1</u>', $var);
   return $var;
}
?>

Thank you in advance for your kind help !

EDIT : Here is how I finally made it work :

<? 
function bbcode($var) {
$var2 = preg_split('`(\[code].*?\[/code])`isU', $var, -1, PREG_SPLIT_NO_EMPTY | PREG_SPLIT_DELIM_CAPTURE);

$var = preg_replace('`\[b\](.+)\[/b\]`isU', '<strong>$1</strong>', $var); 
$var = preg_replace('`\[i\](.+)\[/i\]`isU', '<em>$1</em>', $var);
$var = preg_replace('`\[u\](.+)\[/u\]`isU', '<u>$1</u>', $var);

$var = preg_replace('`(\[code].*?\[/code])`isU', $var2[1], $var);
$var = preg_replace('`\[code\](.+)\[/code\]`isU', '<div>$1</div>', $var);
return $var;
}

$text = 'Hello [b]newbie[/b], to write in bold, use the following [u]lol[/u] : [code][b](YOURTEXT) [u]lol[/u][/b][/code] [b][u]LOL[/u][/b]';

echo bbcode($text); 
?>

HOWEVER, there is a new problem left : if the character chain starts directly with '[code]' for example

[code][b]hello[/b][/code] test

than the result will be :

test test

This is because $var2[1] now leads to what comes after the [/code].

Could someone please help me to make a better delimitation that would also work for that second character chain ? Thank you in advance !

Regex - parsing string into groups

I have specific log messages and I would like to parse it into groups. I would like to make an alternative version in case if my string is more specific.

My logs:

18:48:24:284 => [DEBUG] [xxx.yyy.zzz] [8] Message1
18:48:24:671 => [INFO] [uuu.www.aaa] [8] Method: 'ReturnType MethodName(MethodParameter)'. Line: ~30. Message2

I have written a regex:

(?<timestamp>\d+:\d+:\d+:\d+.*)\s+=>\s+\[(?<level>\w+)\]\s+\[(?<emmiter>.*)\]\s+\[(?<thread>\d+)\]\s+(?<message>.*)

which parses these messages into specific groups:

timestamp: 18:48:24:284
level: DEBUG
emmiter: xxx.yyy.zzz
thread: 8
message: Message1

timestamp: 18:48:24:671
level: INFO
emmiter: uuu.www.aaa
thread: 8
message: Method: 'ReturnType MethodName(MethodParameter)'. Line: ~30. Message2

But right now I would like to add 2 more groups, in case if they exist : method and Line

So, I would like to get results like this:

timestamp: 18:48:24:284
level: DEBUG
emmiter: xxx.yyy.zzz
thread: 8
method:
line: 
message: Message1

timestamp: 18:48:24:671
level: INFO
emmiter: uuu.www.aaa
thread: 8
method: ReturnType MethodName(MethodParameter)
line: ~30
message: Message2

Can you please help me with that? Everything I do results in parsing only Line1 or only Line2 properly, but I would like to parse them both with one regex.

finding repeated characters in a row (3 times or more) in a string

Here is the code for finding repeated character like A in AAbbbc

String stringToMatch = "abccdef";
    Pattern p = Pattern.compile("((\\w)\\2+)+");
    Matcher m = p.matcher(tweet);
    while (m.find())
    {
       System.out.println("Duplicate character " + m.group(0));
    }

Now the problem is that I want to find the characters that are repeated but 3 times or more in a row, when I change 2 to 3 in the above code it does not work, Can anyone help?

Trouble Understanding Regular Expression [duplicate]

This question already has an answer here:

Reference - What does this regex mean? 1 answer

I am trying to learn the regular expression and I have learnt some easier notations but unable to understand what does the below expression means. Please shed some light on it.

Sample Data :

create table targets ( target varchar2(15) );

REM INSERTING into TARGETS
SET DEFINE OFF;
Insert into TARGETS (TARGET) values ('aaa');
Insert into TARGETS (TARGET) values ('a b c');
Insert into TARGETS (TARGET) values ('b  b  b  b');
Insert into TARGETS (TARGET) values ('  bbb xx  ');
Insert into TARGETS (TARGET) values ('wdef def w');
Insert into TARGETS (TARGET) values ('defxdefdef');
Insert into TARGETS (TARGET) values (null);


select target,
       regexp_replace(target,'(d)ef|.','\1') intermediate_string

from targets;

If possible please direct me to some good site for learning Regular expressions. Thanks!

python regex use capture group to define another groups length { }

I am parsing streaming hex data with python regex. I have the following packet structure that I am trying to extract from the stream of packets:

'\xaa\x01\xFF\x44'

\xaa - start of packet
\x01 - data length [value can vary from 00-FF]
\xFF - data
\x44 - end of packet

i want to use python regex to indicate how much of the data portion of the packet to match as such:

r = re.compile('\xaa(?P<length>[\x00-\xFF]{1})(.*){?P<length>}\x44')

this compiles without errors, but it doesnt work. I suspect it doesnt work because it the regex engine cannot convert the <length> named group hex value to an appropriate integer for use inside the regex {} expression. Is there a method by which this can be accomplished in python without resorting to disseminating the match groups?

Background: I have been using erlang for packet unpacking and I was looking for something similar in python

Php: How to ignore newline in Regex

I've already found a lot of stackoverflow questions about this topic. But I cannot find out the solution out of these questions for my problem.

I have the following html:

<p><a name="first-title"></a></p>
<h3>First Title</h3>
<h2><a href='#second'>Second Title</a></h2>
<h3>Third Title</h3>

I want to find out the <h3> prepended by </a></p>. In this case, the output should be:

<h3>First Title</h3>

So I implement the following regular expression;

preg_match_all('/(?<=<\/a><\/p>)<h3>(.+?)<\/h3>/s',$html,$data);

The above regular expression cannot output anything from the above html. But if I remove the newlines from the html, the above regular expression can correctly output my desire result.

I would not like to remove newlines from the html if possible. How should I develop regular expression to ignore the newlines from the source string?

Please, help me.

Python // Regex // Tags

I am trying to extract some text from between

</br></td>, <td class="first">TEXT_1a<br>TEXT_1b
                                </br></td>, <td class="first">TEXT_2a<br>TEXT_2b
                                </br></td>, <td class="first">TEXT_3a<br>TEXT_3b
                                </br></td>, <td class="first">TEXT_4a<br>TEXT_4b
                                </br></td>, <td class="first">TEXT_5a<br>TEXT_5b
                                </br></td>, <td class="first">TEXT_6a<br>TEXT_6b

I used BeautifulSoup (BS4) text = first_td.renderContents() trimmed_text = text.strip() print trimmed_text to extract the text. However, I only get the first text after <td tag. Nevertheless I would like to extract all the text in the tags, preferably sorted in columns (array). After I went with BS and it did not work I thought Regex is the way to go. One minor thing, I am an absolute regex amateur...

Any ideas how to get the text out there?

Thank you in advance

Shell remove string including newlines

I am currently working on a custom source patcher and I'm having troubles with replacing string by another, including newlines.

For instance, I want to remove this pattern :

\n/* @patch[...]*/

In order to get this... :

this.is = code ;
/* @patch beta
    blah blah
*/
if (!this.is) return 0 ;
/* @patch end */

... to this :

this.is = code ;
if (!this.is) return 0 ;

And not this :

this.is = code ;
<- newline
if (!this.is) return 0 ;
<- newline

Using a shell script, I'm using sed command in order to do what I want :

sed -e "s|\/\* @patch.*\*\/||g" $file > $file"_2"

This works pretty well, but the newlines are still there.

This way doesn't work as sed can't parse newlines :

sed -e "s|\n\/\* @patch.*\*\/||g" $file > $file"_2"

Neither this method work : How can I replace a newline (\n) using sed? , nor tr (second answer on the same thread).

Would you have any solution to this ? Even heavy ones, performance is not important here.

P.S. : I am working on a web application, and in this case JavaScript files. Under Mac OS X Yosemite, but no matter what system I'm using, it seems to be a common issue for all bash users.

I found out another solution using Node.js for those who have troubles with their Awk version :

node -e "console.log(process.argv[1].replace(/[\n\r]\/\* @patch([\s\S]*?)\*\//mg, ''))" "`cat $filepath`"

How do I make Wget name files as part of URL?

Short story:

I want Wget to name downloaded files as they match regex token ([^/]*)

wget -r --accept-regex="^.*/([^/]*)/$" $MYURL

Full story:

I use GNU Wget to recursively download one specific folder under particular WordPress website. I use regex to accept only posts and nothing else. Here is how I use it:

wget -r --accept-regex="^.*/([^/]*)/$" $MYURL

It works and Wget follows all the desired URLs. However, it saves files as .../last_directory/index.html, but I want these files to be saved as last_directory.html (.html part is optional).

Is there a way to do that with Wget alone? Or would you suggest how to do the same thing with sed or similar tools?

Replace text on every selected line in a textarea

I have a textarea in my HTML where users can type something. There is a button near the textarea that adds a ' - ' to the beginning of a selected line to make it like a list.

For example if I have the following text:

This is a line of text

And I selected it all and pressed the button it would appear like this:

- This is a line of text

However, this only works when selecting one line. If I was to select two lines, it will only format the one line. Here is my code:

Javascript with some jQuery:

var selection = $("#answer" + questionNumber).getSelection();
if (selection.text != '') {
    $("#answer" + questionNumber).replaceSelectedText('  - ' + selection.text);
    return;
}

The .getSelection() and .replaceSelectedText() methods come from this jQuery plugin.

I though maybe by identifying if there are '\n' in the selection then adding the list styling to it, perhaps using regex, though I'm not sure how to go about coding that.

Here is a JSFiddle of my issue.

Check whether the string a number

I am trying to check whether the string a number is. I have tried the following which works seperatly but not together.

if (i.matches("\\d{2} | [0-9]"))

I appreciate any help.

How to escape xml for use in perl multiline search and replace?

I want to use perl to replace a key-value pair in an xml file, but I am having problems escaping the xml from the regex parser. All this is supposed to run from a bash script. So I have two variables:

macDefault="<key>DefaultVolume</key>\
                <string>91630106-4A1F-4C58-81E9-D51877DE2EAB</string>"

winDefault="<key>DefaultVolume</key>\
                <string>EBD0A8B3-EE3D-427F-9A83-099C37A90556</string>"

And I want perl to replace the occurrence of the value $macDefault with the value of $winDefault in the file config.plist

Unfortunately

perl -0pe  's/'"$macDefault"'/'"$winDefault"'/' config.plist

does not work, as perl reports:

Having no space between pattern and following word is deprecated at -e line 1.
Bareword found where operator expected at -e line 1, near "s/<key>DefaultVolume</key>                <string>91630106-4A1F-4C58-81E9-D51877DE2EAB</string"
Having no space between pattern and following word is deprecated at -e line 1.
Bareword found where operator expected at -e line 1, near "<string>EBD0A8B3"
        (Missing operator before EBD0A8B3?)
Bareword found where operator expected at -e line 1, near "427F"
        (Missing operator before F?)
Bareword found where operator expected at -e line 1, near "9A83"
        (Missing operator before A83?)
Bareword found where operator expected at -e line 1, near "099C37A90556"
        (Missing operator before C37A90556?)
syntax error at -e line 1, near "s/<key>DefaultVolume</key>                <string>91630106-4A1F-4C58-81E9-D51877DE2EAB</string"
Illegal octal digit '9' at -e line 1, at end of line
Illegal octal digit '9' at -e line 1, at end of line
Execution of -e aborted due to compilation errors.

Thanks for any help!

php remove all attributes from a tag

Here is my code:

$content2= preg_replace("/<([a-z][a-z0-9]*)[^>]*?(\/?)>/i",'<$1$2>', $content1);

This code removes all attributes from all tags in my website, but what I want is to only remove attributes from the form tag. This is what I have tried:

$content2 = preg_replace("/<form([a-z][a-z0-9]*)[^>]*?(\/?)>/i",'<$1$2>', $content1);

and

$content2 = preg_replace("/<(form[a-z][a-z0-9]*)[^>]*?(\/?)>/i",'<$1$2>', $content1);

Pyhton/RegEx: Exclude StopWords

I have such list:

stopwords = ['a', 'and', 'is']

and such sentence:

sentence = 'A Mule is Eating and drinking'

Expected output:

reduced = ['mule', 'eating', 'drinking']

I have so far:

reduced = filter(None, re.match(r'\W+', sentence.lower()))

Now how would you filter out the stopwords?

Edit: Note the upper to lowercase conversion

replacing curly brackets and text in it with node

I have a string

var str="Hello my name is {john/www.john.com} and welcome to my {site/www.site.com}."

i have extracted curly brackets and made an anchor tag out of them like

<a href="www.john.com">john</a>

What i am trying to do is replace curly brackets and content in them with these nodes. Is it possible using regExp? I have studied regExp on MDN but still cant figure out the way.

How to grep just strings "ip" into a file?

Do you guys know some other way?

I am trying to find every way to find a string in some text. I want to find more ways using grep or sed. (bear in mind It's case sensitive)

Every word (strings) containing string "ip" and redirect the output result in /root/found;

 grep ip /usr/share/dict/words/ > /root/found

Just words (strings) initiating with "ip" and redirect the output result in /root/found;

 grep ^ip  /ust/share/dict/words > /root/found

Just the word "ip" and redirect the output result in /root/found;

grep ^ip$ /ust/share/dict/words > /root/found

Nginx space on regex condition

I'm trying to block some user-agents on Nginx, and it's working great:

location /page {
  if ($http_user_agent ~* (badUserAgent1|badUserAgent2) ) {
   add_header 'Content-Type' 'text/plain';
   return 200 'Hello World';
  }
 }

But if I add a string with space, I get an error:

invalid condition "$http_user_agent"

Example:

location /page {
  if ($http_user_agent ~* (badUserAgent1|bad user agent 2) ) {
   add_header 'Content-Type' 'text/plain';
   return 200 'Hello World';
  }
 }

I tried to use apostrophes like this: ("badUserAgent1"|"bad user agent 2") without success.

Validate a datetime using: Y-m-d H:m:s in javascript

Pretty simple question, how do i validate a datetime, so the input is both the correct format, but also a valid date unlike 2015-02-30 ....

2015-06-28 16:06:35 //Valid

Regex matching multiple delimiters

I am trying to split on the following delimiters: full stop, semi-colon, *, +, ? and -

I tried the following but I am not making any progress, any help will be appreciated:

sentences = re.split("[.-;]*[\+]*[\?]*[\*]*", txt)

here is the sample text I've been trying this on:

- Text Editor: Now you can edit plain text files with airport tools
* Updated Dropbox support 
* Improved
stability
- New icon

Expected output after the split is a list of items:

TextEditor: Now you can edit plain text files with airport tools, Updated Dropbox support, Improved stability, New icon

Scrapy LinkExtractor - which RegEx to follow?

I'm trying to scrape a category from amazon but the links that I get in Scrapy are different from the ones in the browser. Now I am trying to follow the next page trail and in Scrapy (printed response.body into a txt file) I see those links:

<span class="pagnMore">...</span>
<span class="pagnLink"><a href="/s?ie=UTF8&page=4&rh=n%3A2619533011%2Ck%3Apet%20supplies%2Cp_72%3A2661618011%2Cp_n_date_first_available_absolute%3A2661609011" >4</a></span>
<span class="pagnCur">5</span>
<span class="pagnLink"><a href="/s?ie=UTF8&page=6&rh=n%3A2619533011%2Ck%3Apet%20supplies%2Cp_72%3A2661618011%2Cp_n_date_first_available_absolute%3A2661609011" >6</a></span>
<span class="pagnMore">...</span>
<span class="pagnDisabled">20</span>
<span class="pagnRA"> <a title="Next Page"
                   id="pagnNextLink"
                   class="pagnNext"
                   href="/s?ie=UTF8&page=6&rh=n%3A2619533011%2Ck%3Apet%20supplies%2Cp_72%3A2661618011%2Cp_n_date_first_available_absolute%3A2661609011">
<span id="pagnNextString">Next Page</span>

I'd like to follow the pagnNextString link, but my spider doesn't even start crawling:

Rule(SgmlLinkExtractor(allow=("n\%3A2619533011\%", ),restrict_xpaths=('//*[@id="pagnNextLink"]',)) , callback="parse_items", follow= True),

If I get rid of the rule or do sth. like '^http.*' it's working but it follows everything. What am I doing wrong here?

How to transform the current tab into an image and after that into text?

The problem I have is this: I have 2 kind of questions (see them below). So, here it is: there is a web where questions pop-up and the top 3 fastest persons who answer, get a score for that. I want to make a database with questions, but the OCR method I should use is a bit abstract to me. (NOTE: you can't click the text or select it. It's like an image into the browser). Should I save the whole page as an image and than parse the immage to get the text?

http://ift.tt/1LOsAup "Firt type of questions" http://ift.tt/1LvSsgT "Second type of questions"

The method should be very fast and finish the whole task in less than 2 seconds. Maybe 1 second or 2 at most. My problem is that in the begining, I want to save the question and further, to save the answer, but for now, I have no ideea how to treat the question in order so save it.

I would like to know a method to use a C# library for OCR. The data storage will not be a problem.

Sublime Text Syntax Highlighting – override one pattern with another

I have recently created my own syntax highlighting whose purpose is to aid me process post-OCR file. What it does is basically coloring various scripts – for instance all Latin-derived scripts in green, Cyrillic in orange and numbers in blue. Another function is to highlight all sorts of possible thrash. For instance, some dictionaries use ♦ to mark certain phraseologisms. However, OCR usually treats them as <|>, <ф>, «~► etc. I have a regex for that:

(; [•\<«'~◄\-■][СC0фóJГOО\(\)56\^\*£\$\&§][•\>»'~►\-■] )|(; [•\<«'~◄\-■][СC0фóJГO\(\)56\^\*£\$\&§] )|(; [СC0фóJГO\(\)56\^\*£\$\&§][•\>»'~►\-■] )|(; <> )

The regex works in the search function, but for whatever reason, it either works partially (highlights everything in red except for the letters, which retain their script-based colors) or not at all (I actually moved the whole block writing this, but when I moved it back it didn't work at all).

Here's the YAML-tmLanuage file on pastebin.

So, how do I make such regexes work? What can I do to make them globally override other syntax definitions?

Replace all occurrence of pattern in a string

I have a string that needs that look like this:

{"myObject":"{ \"timestamp\" : \"123\" , \"data\" : {\"description\": \"sample\" , \"number\": \"123\"}

What I need to do is to replace all \" with " only. What is the regex for this?

I have tried inputString.replace(/\\"/g, '"'); but its not working

Regex to remove punctuation from tokenized text

I'm trying to remove punctuation from tokenized text using regular expressions. Can anyone explain the following behaviour:

$ STRING='hey , you ! what " are you doing ? say ... ," what '
$ echo $STRING | sed -r 's/ [^[:alnum:][:space:]-]+ / /g;'
hey you what are you doing say ," what
$ echo $STRING | sed -r 's/ [[:punct:]]+ / /g;'
hey you what are you doing say ," what
$ echo $STRING | perl -pe 's/ [^[:alnum:][:space:]-]+ / /g;'
hey you what are you doing say ," what
$ echo $STRING | perl -pe 's/ [[:punct:]]+ / /g;'
hey you what are you doing say ," what

The ," token is preserved in the output, which I don't want. It's possible to match this token with:

$ echo $STRING | perl -pe 's/ [",]+ / /g;'
hey you ! what are you doing ? say ... what

URL rewrite not found

My URL it's like this

http://ift.tt/1TWVaiZ

and i want to rewrite url's like this

http://ift.tt/1GTUlAH

I thought this was the way

RewriteEngine On
RewriteBase /validar/
RewriteRule ^/([^/]*)/([^/]*)/$/api.php?desde=$1&que=$2 [L]

But i'm getting 404 and cant make it work

Any ideas? Thanks

PHP: filter specific pattern out of string

My raw output of socket_recvfrom is:

ID IP PING IDENTIFIERNUMBER USERNAME

0 127.0.0.1:1234 0 ID123456789 Moritz

1 127.0.0.1:1234 46 ID123456789 August Jones

2 127.0.0.1:1234 46 ID123456789 Miller

It is a single string that contains all of this informations in once and just contains whitespaces between the informations. All keys can be longer or shorter.

My problem:

When I preg_split("/\s+/") it, then I get a good array with useable data, but when the username contains spaces it creates a second index for this. Not good, all data that comes after this just get destroyed.

I sort the array like this: ID, USERNAME, PING, IDENTIFIERNUMBER, IP

Example by the sorting output with username with one space in it:

ID: 0, USERNAME: Moritz, PING: 0, IDENTIFIERNUMBER: ID123456789, IP: 127.0.0.1:1234

ID: 1, USERNAME: August, PING: Jones, IDENTIFIERNUMBER: 46, IP: ID123456789

ID: 127.0.0.1:1234, USERNAME: 2, PING: Miller, IDENTIFIERNUMBER: 46, IP: ID123456789

How do I get the information correctly out of the string?

Just forgot to say:

The string begins with: --------------------------------- in a not countable order. So it can be like 10 characters or 12. The string ends with:

 (8 users in total)

The regex methode looks good. I only need to filter out the other characters.

--------------------------------- 0 127.0.0.1:1234 0 ID123456789(OK) Moritz 1 127.0.0.1:1234 46 ID123456789(OK) August Jones 2 127.0.0.1:1234 46 ID123456789(OK) Miller (7 users in total)

Last problem: http://ift.tt/1eRW0xy

Regex PHP: Get specific content from a block of code from another website

I have a site from which I want to get specific content from 7 posts. Those all 7 seven posts have same HTML layout (See Below)

<div class="eventInfo">
<h3>Z's（矢沢永吉）</h3>
  <h4>Z's TOUR 2015</h4>

<dl>
    <dt><img src="/event/img/btn_day.png" alt="公演日時" width="92" height="20"> </dt>
    <dd>
      <table width="99%" border="0" cellpadding="0" cellspacing="0">
        <tbody><tr>
      <td width="9%" nowrap="nowrap">2015年6月</td>
      <td width="74%">4日 (木) 19:00開演</td>
    </tr>

  </tbody></table>
</dd>
<dt><img src="/event/img/btn_price.png" alt="料金" width="92" height="20"> </dt>
<dd>S¥10,500　A¥7,500 (全席指定・消費税込）<br><span class="attention">※</span>注意事項の詳細を<a href="http://ift.tt/1HpdVrL" target="_blank">矢沢永吉公式サイト</a>より必ずご確認ください</dd>

<dt><img src="/event/img/btn_ticket.png" alt="一般発売" width="92" height="20"> </dt>
<dd>
 <table width="99%" border="0" cellpadding="0" cellspacing="0">
  <tbody><tr>
    <td width="9%" nowrap="nowrap">2015年5月</td>
    <td width="74%">16日(土)</td>
  </tr>
</tbody></table>
  </dd>

  <dt><img src="/event/img/btn_contact.png" alt="お問合わせ" width="92" height="20"> </dt>
  <dd><a href="http://www.siteurl.com/" target="_blank">ソーゴー大阪</a>　06-6344-3326</dd>

  <dt><img src="/event/img/btn_info.png" alt="公演詳細" width="92" height="20"> </dt>
  <dd><a href="http://ift.tt/1GF5ySz" target="_blank">http://ift.tt/1HpdVrN; </dd>
</dl>
</div>

I just want to fetch the H3 from this layout and the first table in the code. What regex method should I use to get the desired results?

Also these are 7 posts just like the code above and I have to get H3 and the first table from each of it.

I have tested it but not sure that is it a correct way or not: http://ift.tt/1GF5z8P

But as you can see that I have to add unwanted data too like H4 DT IMG :(

This regex doesn't work in c++

It is supposed to match "abababab" since "ab" is repeated more than two times consecutively but the code isn't printing any output. Is there some other trick in using regex in C++.

I tried with other languages and it works just fine.

#include<bits/stdc++.h>

int main(){

  std::string s ("xaxababababaxax");
  std::smatch m;
  std::regex e ("(.+)\1\1+");   

   while (std::regex_search (s,m,e)) {
    for (auto x:m) std::cout << x << " ";
    std::cout << std::endl;
    s = m.suffix().str();
  }

  return 0;
}

Redirect Loops and .htaccess

I just moved from a CentOS dedi to an Ubuntu VPS. The site is custom coded PHP.

Frontend works fine (including rewrite rules). The admin backend I can't get rewrite rules to work...

First error:

H00124: Request exceeded the limit of 10 internal redirects due to probable configuration error. Use 'LimitInternalRecursion' to increase the limit if necessary. Use 'LogLevel debug' to get a backtrace.

Then after using debug level:

AH00122: redirected from r->uri = /admin/index.php

The relevant bits of my htaccess are:

# mod_rewrite set:

Options +Includes

RewriteEngine on

# Administration
RewriteCond %{REQUEST_URI} ^(/+)admin/(.*)$
RewriteRule (.*) %{DOCUMENT_ROOT}/admin/index.php [L,QSA]

# Rewrite orther
RewriteCond %{REQUEST_URI} !^(/+)index.php(.*)$
RewriteCond %{REQUEST_URI} !^(/+)syscmd.php$
RewriteRule ^(.*)$ %{DOCUMENT_ROOT}/index.php?page=$1 [L,QSA]

# If Rewriting Failure, Show error message (Internal backup)
RewriteCond %{REQUEST_URI} !^(/+)index.php$
RewriteCond %{REQUEST_URI} !^(/+)syscmd.php$
RewriteRule (.*) \1 [F]

This was working fine on CentOS too.

Any ideas? I already tried adding the following as the first condition:

RewriteCond %{REQUEST_URI} !/admin/ [NC]

That stopped it rewriting /admin completely.

Thanks

Grouping Output Pattern in Regex- Python

I would like to get only the sentences which match the regex pattern as output and avoid these NONE, how do i group the output which matches the pattern?

import re regex = re.compile('(.*)(?:India)') with open("D:/txt_res/abc.txt") as f: for line in f: result = regex.search(line) print(result)

The output which im getting is

None

<_sre.SRE_Match object; span=(0, 101), match='Email: abc.bitz@gmail.com >

None

<_sre.SRE_Match object; span=(0, 47), match='XYZ Engineer at ABC Organization, India'>

None

<_sre.SRE_Match object; span=(0, 32), match='Intern at S360, India'>

None

Using replace() replaces too much content

I'm using replace() to transform t in gwhen tis not followed by a letter p, I'm using this line of code:

"tpto".replace(/(t)[^p]/g, "g");

However, the result of this function is tpgand I was expecting tpgo. As I don't know which letter will follow the t I need something dynamic but I don't know what to do, any ideas?

Java String Replace Using Reular Expression

Original String: Flexible Premium Deferred Annuity (Policy #0410011)

Expected String : Flexible Premium Deferred Annuity

Would appreciate if someone can provide java code to accomplish this.

Thanks.

Download site with selected CGI input

So I have this script:
http://ift.tt/1JrcpWF
How do I download for example, file which has input 1880: http://ift.tt/1SUjWig from this site
They are all cgi files and when I enter http://ift.tt/Y9onHf, it gaves me output for 2013. But I checked the code of the sites and there is input for year and number of entries. So how do I select the entry I need using Python?

Extracting using a string pattern in Regex- Python

Cant we give a string in the Regex? For example, re.compile('((.*)?=<Bangalore>)'), in the below code i have mentioned <Bangalore> but its not displaying.

I want to extract the text before Bangalore.

import re

regex = re.compile('((.*)?=<>)')

line = ("Kathick Kumar, Bangalore who was a great person and lived from 29th 

March 1980 - 21 Dec 2014")

result = regex.search(line)

print(result)

Desired output: Kathick Kumar, Bangalore

How to censor website links?

I've been working on a regex censor for quite the time and can't seem to find a decent way of censoring address links (and attempts to circumvent that).

Here's what I got so far, ignoring escape sequences:

([a-zA-Z0-9_-]+[\\W[_]]*)+(\\.|[\\W]?|dot|\\(\\.\\)|[\\(]?dot[\\)]?)+([\\w]{2,6})((\\.|[\\W]?|dot|\\(\\.\\)|[\\(]?dot[\\)]?)([\\w]{1,4}))*

I'm not so sure what might be causing the problem but however it censors the word "com" and "come" and pretty much anything that is about 3+ letters.

Problem: I want to know how to censor website links and invalid links that are attempts to circumvent the censor. Examples:

Google.com

goo gle .com

g o o g l e . c o m

go o gl e % com

go og le (.) c om

Also a slight addition, is there a possible way to add links to a white list for this? Thank you.

regex encapsulation

I've got a question concerning regex.

I was wondering how one could replace an encapsulated text, something like {key:23} to something like <span class="highlightable">23</span, so that the entity will still remain encapsulated, but with something else.

I will do this in JS, but the regex is what is important, I have been searching for a while, probably searching for the wrong terms, I should probably learn more about regex, generally.

In any case, is there someone who knows how to perform this operation with simplicity?

Thanks!

samedi 27 juin 2015

Converting perl snippet to java

I want to rewrite a perl code in java:

sub validate_and_fix_regex {
    my $regex = $_[0];
    eval { qr/$regex/ };
    if ($@) {
        $regex = rquote($regex);
    }
    return $regex;
}

sub rquote {
    my $string = $_[0] || return;
    $string =~ s/([^A-Za-z_0-9 "'\\])/\\$1/g;
    return $string;
}

the code gets a regex and fix it if it has any escaped character. i cant find any alternative for eval { qr/$regex/ }; and $string =~ s/([^A-Za-z_0-9 "'\\])/\\$1/g; in java.

Regex Look Ahead

Today for a project I was trying to make use of regular expression and learnt about groups and how to use them. I am using this site to test it.The problem is that whenever I write the following regex:

(?=\S*\d)

, the site gives me an error : the expression can match 0 characters and therefore can match infinitely.

while this doesn't throw any error :

(?=\S*\d)(\S{6,16})

can anyone explain to me what is the meaning of the error.

PHP regex strip coma and space from beginning and end of string

I have some strings like this

", One "
", One , Two"
"One, Two "
" One,Two, "
" ,Two ,Three "

and need to remove space and or coma at beginning and end of string only tried few regex with preg_replace(), but they replace all occurrences.

EDIT: Actually would be great to remove all clutter like !@#$%^&*( etc whatever is at the end and beginning of string, but not in between.

Optionally need to make strings look proper by placing word then coma then space then another word (if there's coma one in between words).

Example "One,Two ,Three , Four" into "One, Two, Three, Four".

P.S. Please provide answer as two separate regex as its easier to understand.

In C#, how can I get the start/end indexes of all the replacements by the Regex.Replace() function

I have made a program to highlight the phrases in the input matched by a given Regex expression on the fly.

However, I want to highlight the replacements in the output panel too. To do this, I need to obtain the indexes and lengths found by Regex.Replace(). Unfortunately, it would seem C# doesn't give access to this data. Have I missed something?

I've thought about manually trying to figure out the indexes by accumulating sums given off from the MatchCollection produced by Regex.Matches(). But this is prone to error, and may not take into account the special $ symbol in the replace expression which could throw the figures off.

There must be a more elegant way.

Regex Expression replacing spaces in both end of the word

enter image description here

http://ift.tt/1ICDGzE

/(^|\s):bin:(\s|$)/gm

It is unable to scan and replace the one in the middle. How can I fix that without repeating replace() twice.

variable expansion as a pattern in sed not working

I've a simple script to set several parameters in /etc/ssh/sshd_config :

#! /bin/bash

declare -a param=('Banner' 'ClientAliveInterval' 'ClientAliveCountMax' 'Ciphers' \
'PermitUserEnvironment' 'PermitEmptyPasswords' 'PermitRootLogin' \
'HostbasedAuthentication' 'IgnoreRhosts' 'MaxAuthTries' \
'X11Forwarding' 'LogLevel'\
)

declare -a val=('/etc/issue.net' '300' '0' 'aes128-ctr,aes192-ctr,aes256-ctr' \
'no' 'no' 'no' 'no' 'yes' '4' 'no' 'INFO' \
)

for (( i=0;i<12;i++ ))
do
 #echo "${param[$i]} ${val[$i]}"
  egrep "^[ #]*${param[$i]}.*" /etc/ssh/sshd_config &> /dev/null
   if [ $? -eq 0 ];
    then
       sed -i "s|^[ #]*\$param[$i].*|${param[$i]} ${val[$i]}|1" /etc/ssh/sshd_config
  else
       echo "${param[$i]} ${val[$i]}" >> /etc/ssh/sshd_config
  fi
done;

However the variable expansion in sed pattern match is not working as desired:

sed -i "s|^[ #]*\$param[$i].*|${param[$i]} ${val[$i]}|1" /etc/ssh/sshd_config

Can someone help me. My array expansion and everything in the script is fine though. I've checked the same with an echo printout.

Regex Expressions For Emoji

http://ift.tt/1LneX8A

function custom() {
var str = document.getElementById('original').innerHTML;
var replacement = str.replace(/\B:poop:\B/g,'REPLACED');
document.getElementById('replaced').innerHTML = replacement;
}
custom()

Yes = :poop: should be replaced with "REPLACED" No = :poop: should not be replaced. In other words, remain untouched.

Number 4, 5, 6 doesn't seems to follow the rule provided. I do know why, but I don't have much idea how to combine multiple expressions into one. I have tried many others but I just can't get them to work the way I wanted them to be. Odds aren't in my favor.

And yes, this is very similar to how Facebook emoji in chat box works.

New issue:

enter image description here

http://ift.tt/1ICDGzE

/(^|\s):bin:(\s|$)/gm

It is unable to scan and replace the one in the middle. How can I fix that?

Python regex: Matching a URL

I have some confusion regarding the pattern matching in the following expression. I tried to look up online but couldn't find an understandable solution:

imgurUrlPattern = re.compile(r'(http://i.imgur.com/(.*))(\?.*)?')

What exactly are the parentheses doing ? I understood up until the first asterisk , but I can't figure out what is happening after that.

This is a code from Thinking in Java, I do not understand the output of the code. Please somebody help me

//I am trying to learn Java Regex and encountered a code on which I am stuck from a long while, please explain me the working of this code. //The command line arguement is //String args[] = "abcabcabcdefabc "abc+" "(abc)+" "(abc){2,}"

import java.util.regex.*;

public class PatternMatcher {

public static void main(String[] args) {    
    if(args.length < 2) {     
        System.out.println("Usage:\njava TestRegularExpression " +       
        "characterSequence regularExpression+");       
        System.exit(0);     
        }   

        System.out.println("Input: \"" + args[0] + "\"");    
        for(String arg : args) {      
        System.out.println("Regular expression: \"" + arg + "\"");      
        Pattern p = Pattern.compile(arg);      
        Matcher m = p.matcher(args[0]);       

        while(m.find()) { 
            System.out.println("Match \"" + m.group() + "\" at positions " +        
        m.start() + "-" + (m.end() - 1));      
            }   
        }  
    } 
}

Output:- Input: "abcabcabcdefabc" Regular expression: "abcabcabcdefabc" Match "abcabcabcdefabc" at positions 0-14 Regular expression: "abc+" Match "abc" at positions 0-2 Match "abc" at positions 3-5 Match "abc" at positions 6-8 Match "abc" at positions 12-14 Regular expression: "(abc)+" Match "abcabcabc" at positions 0-8 Match "abc" at positions 12-14 Regular expression: "(abc){2,}" Match "abcabcabc" at positions 0-8

python regex use capture group to define another groups length { }

I am parsing hex data with python regex. I have the following packet structure:

'\xaa\x01\xFF\x44'

\xaa - start of packet
\x01 - data length [value can vary from 00-FF]
\xFF - data
\x44 - end of packet

i want to use python regex to indicate how much of the data portion of the packet to match as such:

r = re.compile('\xaa(?P<length>[\x00-\xFF]{1})(.*){?P<length>}\x44')

this compiles without errors, but it doesnt work (i suspect because it cannot convert the hex value to an appropriate integer) Is there a method by which this can be accomplished in python?

Background: I have been using erlang for packet unpacking and I was looking for something similar in python

Using arrays in regular expressions?

Does anyone know if there is a way to use an array in a regular expression? suppose I want to find out if somefile.txt contains one of an array's elements. Obviously the code below doesn't work, but is there something similar that does work?

array = [thing1 thing2 thing3]
file = File.open("somefile.txt")

file.each_do |line|
if /array/.match(line)
puts line
end

Basically I've got a big list of words to search for, and I'd like to avoid something like this:

($somefile =~ /(thing1|thing2|thing3)/)

Appending a line just after the matched pattern in sed not working

My /etc/pam.d/system-auth-ac has the below auth parameters set:

auth        required      pam_env.so
auth        sufficient    pam_unix.so nullok try_first_pass
auth        requisite     pam_succeed_if.so uid >= 500 quiet
auth        required      pam_deny.so

I want to insert pam_tally2.so just after pam_env.so. So I want it to be:

auth        required      pam_env.so
auth        required      pam_tally2.so onerr=fail audit silent deny=5 unlock_time=900
auth        sufficient    pam_unix.so nullok try_first_pass
auth        requisite     pam_succeed_if.so uid >= 500 quiet
auth        required      pam_deny.so

The script that I'm using is :

#! /bin/bash

grep "pam_tally2" /etc/pam.d/system-auth-ac &> /dev/null
if [ $? -ne 0 ];
then
   sed -i '/^[]*account[]*required[]*pam_unix.so/aauth\trequired\tpam_tally2.so onerr=fail audit silent deny=5 unlock_time=900' /etc/pam.d/system-auth-ac
else
   sed -i 's/.*pam_tally2.*/auth\trequired\tpam_tally2.so onerr=fail audit silent deny=5 unlock_time=900/1' /etc/pam.d/system-auth-ac
fi

But it gives this error:

sed: -e expression #1, char 116: unterminated address regex

What am I doing wrong ?

generate regex only between two number but find all number include 1 to 10

i want to generate regex only between 1 to 10 i use this

(10|[1-9])

but notepad++ find all the numbers that start 1 to 10 but i want to find only 1 to 10 numbers not all numbers that include part of 1 to 10

sorry for my poor english